Leveraging Randomized Clinical Trials to Generate RWE for Regulatory Purposes – Day 1

By Adem Lewis / in , , /

(audience chattering) – Good morning, good morning. I’m Greg Daniel, I’m Deputy Director at
the Duke-Margolis Center for Health Policy and I’d like to welcome all of you to our public workshop today, Leveraging Randomized Clinical Trials to Generate Real-World Evidence
for Regulatory Purposes, which is being hosted by
the Duke-Margolis Center for Health Policy and supported by a cooperative agreement with the FDA. I’m not sure I really need
to remind all of you why we’re here today, as many of
you have been with us and FDA and forums like this one
over the last several years. But in short, an exploding
amount of real-world data paired with new and increasingly
sophisticated methods in analytic approaches for
turning that real-world data into real-world evidence, have
led to a growing call to make these data and evidence actionable for a wide-range of health care decision-makers, including regulatory
authorities like the FDA. FDA’s directive to establish
a real-world evidence program was codified in
the 21st Century Cures Act and VI PDUFA authorization which outlined a path for the agency to explore the use of real-world data and real-world evidence to, one: help support the approval
of a new indication for a drug already approved
under 505(c) of the FD&C Act, or two: to help support or satisfy drug post-approval study requirements. To that end the FDA has engaged
with many of you to better understand the potential
challenges of using real-world data and real-world evidence in a
variety of regulatory contexts, and ultimately how to overcome
some of those challenges. A framework published by
the FDA in December of 2018 describes many of these challenge areas and key considerations for
evaluating real-world data and real-world evidence
and their roadmap for moving toward eventual
guidance in this space. As all of you know this
includes a wide-range of issues related to data
quality and relevancy, study designs and analytic methods, and how best to potentially meet well-established regulatory requirements. Today’s discussion we’ll
focus-in on one specific part of the puzzle, the use of
real-world data generated during randomized studies
embedded in clinical care settings to generate real-world evidence
for regulatory purposes. You’ll hear momentarily
from our FDA colleagues that there are a wide-array
of potential uses of real-world data to improve other parts of randomized study designs
and randomized clinical trials, such as improving the
efficiency of identifying and recruiting patients, or
for utilizing new technologies to take traditional,
randomized clinical trials to the patient through things
like remote monitoring. But for the purposes of today’s
discussion and tomorrow’s, we are discussing how to
design and conduct a study that randomizes patients within
their clinical care settings, follows those patients prospectively, and collects real-world data
to generate real-world evidence that supports a regulatory application as outlined in FDA’s framework. Throughout the day we’ll be
highlighting actual trials and case examples that have
been or in the process of being implemented in
clinical settings to further illuminate the challenges in this space. While not all of these trials
were designed to address regulatory questions
they do provide concrete examples of how trials and trial teams have dealt with relevant
study design issues. Importantly the discussion
in this forum today will help to further
refine FDA’s RWE program and build the foundations
for future guidance. Just to provide a quick
overview of the agenda for today and tomorrow and then I’ll turn to a few housekeeping items. We’re gonna start the day
with a key note address by Jacqueline Corrigan-Curay, the Director of Office of
Medical Policy at CDER FDA, and she’ll be providing some framing for the day’s discussion, as well as, FDA’s motivation for convening
this public workshop. We’ll then have two opening
presentations that will explore key considerations for randomized
designs at the point of care and the landscape of
approaches currently in-use. We’ll then move into a series
of panel discussions for the remainder of the day which
will take on different methods, components for implementing
randomized clinical trials that generate RWE for regulatory purposes. The first of those
sessions will start at 9:45 and we’ll focus on intervention selection and study design issues. That’ll take us to our
first break at 11:00 a.m., and we’ll reconvene at 11:15 for a session on outcome measurement
using real-world data. This’ll take us to lunch at 12:30, that will be on your own, we’ll reconvene back here at 1:30 p.m., and start back up on a
panel on key considerations for building in real-world settings. 2:45 will be another break, and then we’ll come back
for a final panel session on causal inferences. So we’ll hear a range of issues, process issues, study design
issues, implementation, but how that all gets to causal inference will be discussed in
that particular session. We’ll end Day One with a
an Open-Comment Period, and that’ll provide time to
reflect on what was discussed today, we’ll hear and encourage
comments from the audience and the webcast on how FDA
might consider appropriate uses for other key issues regarding
the evidence generated from these trials for
regulatory decision-making. On Day Two, we’ll move our
focus from trial designing methods to relevant, regulatory
and practical considerations for insuring appropriate
conduct of the trial. We have three focus areas
teed up for tomorrow. Presentations will cover
and center on the Salford lung study which as many
of you know is not the only but one of the key examples
of a trial that was integrated within clinical settings for
regulatory decision-making. Members of the trial team
will show how they dealt with some of these issues and our
panelists will provide some additional examples to
further tease out key issues. This will then be followed
by a session where we’ll hear from stakeholders on what
issues they feel are relevant for building FDA’s RWE
framework continuing to build in a program moving forward. We’ll end tomorrow with
Final Open Comment Session. Okay a few more housekeeping items, it’s really humid, so I’m
sweating. (everyone laughs) I had to walk, so… (trails off) Before I hand it over to Jacqueline, a few housekeeping items. I wanna remind everyone that
this is a public meeting, it’s also being webcast online. The record of the
webcast will be available on the Duke-Margolis
Website following the event. This meeting is intended
to spur discussion. We’re not looking to achieve consensus, we won’t be voting, but rather we’ll hear a variety of perspectives on these issues. And as you can see there’s a
lot of reserve time throughout all of the sessions for
comments, questions and then a session at the end for
additional open comments. There are microphones set up in the aisles and we’ll be taking questions from the audience and the webcast. If you do have a question, please send it to the following address: [email protected] you can also follow us, we’ll be Live Tweeting the event and many of you in the room can too, please follow @dukemargolis
using the #RWEregulatory and as a heads-up to
speakers and panelists, Hayley Sullivan will be
keeping us all on schedule and she’s in the front
with some time cards, and lastly, feel free to help yourself to beverages and drinks
outside of the room. So with that, I’ll open things up for Jacqueline Corrigan-Curay, as I mentioned, Director of Office and
Medical Policy at CDER. Okay, thank you (whispers
quietly to Jacqueline). – Okay, thank you Greg, I have good news, this will be the shortest
key note you’ve ever had. I just wanna say a couple of
things before we get started. First of all, I wanna
thank all of our speakers and our panelists for
joining us and really sharing their expertise
over the next two days. I also wanna thank those
who are here, in person, and those who have joined us virtually, for the next day-and-a-half. I wanna extend my thanks to
the staff in Duke-Margolis who really have done all the
work to pull us here together and prepare us and
especially Adam Kratchker, Morgan and Morgan Romaine. And finally I wanna of course
thank my colleagues at FDA, who are coming here to share
their expertise as well as those who sort of, behind
the scenes brought us here and that’s Doctor Elzarod,
Captain David Martin, and Captain Deanne Perrone. So as Greg mentioned, of course we do have a
congressional mandate in this area to develop a program and evaluate RWE, but I think it’s just as
important to know that we are actually just as
interested as congress is in this. We understand that there’s
challenges to clinical trials. We understand the need to
make them more efficient and we understand the
need to really integrate and bring clinical research
and practice together. We’ve talked a little bit
about real-world evidence, our definition of real-world
evidence is clinical evidence regarding the usage and
potential benefits and risks of a medical product derived from analysis of real-world data and I think most of us are familiar with sources
of real-world data. By analysis of real-world
data we were referring to study design, which is
inclusive of randomized trials, as well as observational studies. In late 2016, leadership
from FDA wrote about real-world evidence and
what it might provide us. And they said that, “as we adopt the tools and
methods as traditional trials “to real-world settings we
must consider the components “of such trials that are critical
to obtaining valid results “and minimizing bias. “And although real-world
evidence can be used in multiple “scenarios, the selection
of the appropriate analytic “approaches will be determined
by the key dimensions of the “study design, including the
use of prospectively planned “interventions and randomization.” So our goal over these next
two days is to identify, what are those key dimensions
of the study design and break down the components
of a clinical trial that could be integrated more
closely into clinical practice and thereby capitalize on the
use of data captured everyday in these settings. The prospective randomized
trial in this setting has potential to both
generate high quality data, be more accessible to diverse populations, and realize the efficiencies
that RWD has to offer, while still providing a
assurance that randomization will control for key confounders
both known and unknown. You may have noticed that we are not using the term “pragmatic” in this meeting. We understand that by bridging the divide between clinical research
and clinical practice, the resulting design will likely have certain pragmatic elements, such as, broader inclusion criteria, and intervention delivered
in clinical practice with follow-up done by
a patient’s providers, and endpoints that are relevant
to patients, and providers, and therefore are captured in
the course of clinical care. However, we also recognize that
different clinical questions may warrant a more or
less pragmatic design, and we don’t wanna focus on
achieving a degree of pragmatism but rather degree to
which real-world data can enhance the efficiency and
relevancy of our trials. In our framework we promise
to prove guidance in this area and we hope over the next
few days that this meeting will help us all illuminate
relevant concepts of interest. Identification of these
key issues and the design and implementation will
assist us into determining where do we need to provide guidance. So I know a lot of you have
been to real-world meetings in the past, we at this
meeting, we were hoping not to talk about the force, but to
really get down to the weeds. So I wanna thank everyone for being here. I want everyone to roll up their
sleeves on this nice, warm, Washington summer day!
(crowd laughs) And let’s get started! And I’ll yield any remaining
time to Dr. Temple. (paper rustling) (audience applauding) – Okay. So, how do I get my slides up? (objects clattering) There you go. (paper rustling) Ah, I see okay, here you go. There you go. Okay well, good morning everyone! Despite the heat, it’s
reasonably comfortable in here. As everyone in here knows,
there’s tremendous interest in making use of the vast amounts of data that are already collected
in health care systems. Electronic records, claims data,
registries, all that stuff. To do that, to more
efficiently generate evidence of effectiveness, the
greatest interest actually, not here today is in use
of observational data, that is not randomized trials, but fortunately I’m glad to
say, that is not the subject of today’s effort.
(audience chuckles) In previous publications by FDA, there’s a well-known
paper by Rachel Sherman and a whole bunch of us in 2016. We emphasize the lack of, the difficulties and lack of experience in
using observational data and we’ve got an
interesting study going on but other people can tell you about that. What we urged attention to was using real-world data in
randomized clinical trials and that’s what this workshop
is about, which is terrific. But there are many issues
that’ll need to be considered and indeed, the program shows this and I’m just gonna touch
on a couple of them. So the conference is about
using data from the health care system in randomized trials to
generate real-world evidence as Jacqueline indicated, what
people mean by real-world evidence and the specific study designs to be considered is not well-established, and that’s why the
conference is so important. And the specifics of the data
generated by randomized trials within the health care system
could vary tremendously depending on whether the endpoints that are used are just
extracted from routine care or are actually put into the study as something to be looked at, and I’ll try to explore the
full-range of possibilities. (paper rustling)
Obviously briefly. Before addressing study design issues, I wanna just offer one
note about the effect on outcome of the quality
and precision of the data. Whether, for example, a
person did or did not have an outcome of interest, and the
severity of that outcome, isn’t always obvious, even
for a well-done trial, with hard endpoints and
everybody looking at them. And that’s why in
cardiovascular outcome studies, we often have adjudication
committees to help decide on that sort of thing. And many endpoints are even
more subjective than that, if they’re recorded at
all, like pain, depression, and stuff, you just, have to know whether you have it or not, and as everybody here undoubtedly knows, imprecision and noise can obscure facts. That is, if the thing is noisy, you can fail in a superiority study, or get a spurious excess
in a non-inferiority study. So precision really matters
and that could depend on the data system, maybe
registries are better than EHR’s or claims data, I don’t know. That’s one of the things
we need to find out about. I’m not sure anybody can read
this but the easiest use of real-world data, the
so-called “Low Hanging Fruit”, and even though this is not
really a major point of this workshop, it seems very
critical to take note of one obvious use of real-world data that doesn’t get enough discussion. And that is by facilitating identification of potential study participants. That is identifying the
people for study using EHR, or claims data that would be
very important as recruitment. And that seems critical to me. You know, I don’t do
trials, we just watch ’em… Is that, what everybody says
is that recruitment is a major impediment to getting the trial done. This could make a big difference. You’d look within the
system and find them. So you could find the
presence of the disease, or condition being studied, that should be fairly easy. Maybe you can get important patient characteristics like demographics, or the past history and
potential enrichment factors, like how long the disease
been there, how severe it was, whether it led to
hospitalization and stuff, history of compliance with
treatment, you can look at that sort of thing, past outcome
events, heart attacks, strokes, and stuff, relevant laboratory measures… Maybe you can get a lot of what you need to refer to the trial, which would be a big difference. And then having identified people, the actual study could just
be a conventional trial following the screen for
patient identification, which would include
specific study monitoring for effectiveness using
relevant endpoints. You could look at safety
outside the system, that is you could have a
regular case report form. You could, you might
need to get new consent with full information
reported to the patients. You might need more
detailed entry criteria: history, lab tests, and stuff. I don’t know whether that’s
still real-world evidence after you found the patients in the
real-world and then did it. But that’s something to talk about. So as noted, an easy
use of real-world data is to find patients,
which is a major problem. It seems possible at least
that patients could be told about the study and
consented, possibly queried to obtain more details,
beyond the EHR and stuff, to allow enrichment with disease severity, other features like
abnormal renal function, and stuff like that. Anyway, you’d then do a
conventional randomized trial with identified investigators,
scheduled monitoring, regular laboratory data, complete
safety assessments, et cetera… That’s one kind of thing to think about. But at least in some cases,
probably for a marketed drug, where the need for safety
data is more limited, maybe clinic visits could
be markedly reduced by using novel endpoints to provide
some, or even all, of the data. We’re having growing numbers
of examples of these, but they include clinic
visits by telemedicine, that reduces the amount of effort, everybody has to go to come in, use of on-line PROs, and use of devices, such as smart watches
and others to detect how many steps you took and
all that kind of stuff. So those endpoints are
not EHRs or claims data, but they are, at least
arguably, real-world evidence. When you could use real-world data alone, from EHR and claims to assess
the effect of an intervention it’s obviously possible for
some very specific endpoints, notably survival, although
there’s always questions about cause-specific
mortality, and hospitalization, but again, there’s always
concern about whether the description of the reason for
hospitalization is accurate. And it’s recognized that the
state of very few symptomatic endpoints will be reliably
recorded, not anxiety, depression, pain, anxiety,
and all that kind of stuff, so it’s not clear that you can use the system to detect
those kinds of endpoints. The only one’s we’ve ever
seen discussions of have been in certain oncology settings,
cardiovascular outcomes, and maybe some pulmonary settings. So when you might use real-world
data remains to be seen. Sorry. Okay. If the drug is not
marketed you probably… (papers rustling)
(objects clunking) Sorry.
(papers rustling) If the drug is marketed you
probably need safety monitoring that goes beyond what can
be captured in the EHR. So you’ll need observations
periodically, lab tests, actual visits with the investigator, although, conceivably
done in decentralized ways with local lab tests,
telemedicine, and so on… But, in this case the trial will be more or less conventional trial, but with use of decentralized data. It’ll probably still need
identified investigators with clear monitoring responsibility, again with the potential for
decentralized interactions. But that’s a little more
like a conventional trial. But there are some
cases that seem special. Suppose the treatment
is a one-time treatment, TPA or streptokinase in
a post-infarction trial, you only give it once, or is a maintained, standardized,
and unchanging treatment with late outcomes of interest, such as an adjuvant chemotherapy trial, or a bisphosphonate
fracture prevention trial… There’s nothing much to
think about and change. I mean, there may be
toxicity to watch for. But maybe there’s some
way of incorporating those into the system and just
watching for the outcomes, which should show up in fact
in the health care records. There could clearly be safety issues. Maybe these could these
be done without identified investigators as part of a
sort of medical practice thing and could that depend on
the health care system. It seems easier to do those
things in the VA, but maybe there could be standardized
queries sent out periodically? So there were possible hybrid
versions of all these things. And there are examples of
trials that have been done using real-world data or
almost real-world data. I don’t know, it’s hard to tell. It’s funny, I’ve gone
back to look at protocols for the Gissi Study a million times, and there’s just a bunch
of things you can’t tell. You can’t tell how they
consented patients, you can’t tell who the
investigators were, anyway… But in the Gissi Study, 176 Italian coronary
care units took patients who were within 12 hours of a heart attack and they were randomized to
streptokinase or placebo. The primary endpoint was 21 day survival and it showed a nice effect and that was based on registry data. They also got other endpoints later from the local resources, but it’s very hard to tell from reading it and I tried again last night, who the investigators were, what their individual
responsibilities was, was it the whole clinic
that was the investigator… Anyway it’s hard to tell. The Taste Study, which
everybody knows about probably, is a perfect case for this, and the question was, does intracoronary aspiration of thrombus prior to percutaneous intervention, people were randomized
to thrombus aspiration plus PCI or PCI alone
in patients with STEMI and the question was, did that work. They used the Swedish Coronary Angiography and Angioplasty Registry system, SCAAR, to find patients within
24 hours of pain onset and then they randomized to aspiration plus PCI or PCI alone. Primary endpoint was mortality which they got from a registry, and then they used other registries, which are available in
Sweden, for other endpoints, like recurrent MI and stent thrombosis… A pretty good example, again made possible by the fact that the intervention was a one-time thing. One that’s going on in this country, the Impact AF Study is a
randomized trial in Mini-Sentinel Data Partners of the effect
on anti-coagulant use in patients with atrial fibrillation, but not yet receiving an anti-coagulant of a standardized message to physicians. And the study will also
look at stroke outcomes. Again, it’s sort of a one-time thing and you can then let the
system do what it will. It’s also worth noting that
the Peto/Collins proposal in 1995 for large simple
trials goes a long way toward simplifying conventional
randomized trials, bringing them closer to real-world data by minimizing data collection,
exclusions, and so on. And the ISIS trials were
pretty good that way. But they had many remaining
conventional components. There are other issues that we’ll be discussing that later… Who exactly is an investigator and what’s their responsibility when you do something in a real-world setting? Almost all trials, even real-world trials, involve investigators
going beyond usual care, maybe except when there’s a single dose, as I said before… How does this fit with the IND
rules and responsibilities? And then a very important
question always is, the consequences of decreased
rigor in assessing outcomes. As I said before, noise obscures, much of what is sought
in pragmatic trials, could lead to less precision: some patients may not have the disease, or the desired severity of disease; some endpoints are
missed or are erroneous; patients are lost to follow up; patients stop using the drug
or take other similar things. All of those problems
decrease study power, a bad outcome in a
difference-showing trial, and a credibility problem
in a non-inferiority study. I didn’t do this very well. But I mean, one of the things that people talk about all the time is pragmatic trials. And when you go back to the PRECIS presentation by Zwarenstein and others, you look at all the various
things they wanna do differently and most of them aren’t
gonna help very much. One is, one that they say is, you don’t want perfect
compliance in a pragmatic trial. Well that’s not right, of course you do. You don’t want… Do I wanna know whether a drug still works if nobody takes it? No I’m not interested in that. I already know the answer. It’s not important. So you need good compliance in any trial to find out whether the drug works. And then the other features, another principle feature
of the pragmatic trials, that has a broad range of people. Well we’re all for that. We’ve been trying internally
to get people to include older people, people from
a wide variety of races, and all that kind of stuff, so we share that feature
of the pragmatic trial. But most of the others, sort of just get in the way. But it’s one of the things that probably should be discussed. But I’m… I was (chuckles) going back and looked at a lot of the
writings about pragmatic trials, and I confess the benefit doesn’t really seem entirely clear to me. If patients don’t take the drug
the effect will be smaller, I know that, I didn’t need
a trial to tell me that. If the patient populations
are poorly defined, if they don’t really have the disease, all those things lead to failure
to show that the drug works and I don’t see what
the benefit of that is. Explanatory trials, I
don’t even know whether explanatory trials do
give different results, or how much documentation
there is of the difference between the so-called pragmatic
trial or explanatory trial. And I have an old
interchange with Sean Tunis about whether there have been any showings that this has happened and there are, according to Zwarenstein, very few, but they’re interested in looking. So the whole issue deserves some discussion if we’re gonna talk about it. But I’m extremely skeptical. The main difference I see
between the two is compliance, and I already know if people don’t take the drug it won’t work. So I’m not sure how much
you get out of that. Anyway, so the discussion of
theoretical and practical ways to make use of real-world
data to gain evidence of effectiveness, that is… Is that real-world evidence, maybe? In randomized trials will
cover a wide range of issues, including: data quality,
clinical relevance practicality, and much, much more. And I, and I’m sure everyone else is looking forward to this
exciting program, thanks. (papers rustling)
(objects clattering) (audience applauding) – I think we’ll have… I think I’m gonna have
you sitting over here. – So, I sit over here?
– Yeah. Okay, thank you Bob. That was Bob Temple, Deputy
Director for Clinical Science at the Office of New Drugs CDER FDA, and I’ll also, in the meantime, you know, this humidity’s been
bothering our sign as well. (audience chuckles) Introduce Lesley Curtis,
Chair and Professor in the Department of
Population Health Sciences and Interim Executive Director at the Duke Clinical Research Institute. Lesley?
(papers rustling) – Great, thank you! Thank you, Greg, and it’s great to be here, not just this morning, but for the next day-and-a-half. I think this is gonna be a
terrific, terrific workshop. And what I wanted to do
this morning is just provide the group with some of the
experience that we’ve actually had in the real-world in the
setting of randomized trials that have all leveraged
real-world data or in one case, using real-world data
alongside a clinical trial. So I wanna start by just touching briefly on the evidence-base that I’m
drawing from for my, sort of, lessons learned or general
experiences that I’ll talk about. The first is the NIH Collaboratory, which some of you may be familiar with, an NIH-funded network
collaboratory that really is, was developed, was created,
almost a decade ago, to strengthen our capacity to
do embedded clinical trials embedded in health care delivery systems, embedded pragmatic
clinical trials, and really engage health care delivery
organizations as partners. Importantly the collaboratory also really tasked the Coordinating Center, which I am one of the PIs along with Adrian Hernandez and Kevin Wineford, tasked us with creating
generalizable knowledge about how, how to do embedded
pragmatic clinical trials. So currently there are about
15 demonstration projects, all randomized trials, they all have a one-year planning phase, and then an implementation phase. So just to give you a sense, these trials that are on-going, really reflect a variety of interventions and severity of participants. So just to anchor you with a… At the upper-left and lower-right corners, the HiLo Trial is one of our
newest demonstration projects, looking at whether less
stringent control of serum phosphate yields non-inferior
all-cause hospitalization rates in an ESRD population, and there at the lower-right-hand of this, and these are all of the 15
trials that are going on, in the lower-right-hand, the GGC4H Trial, not a catchy acronym, I’m afraid, but the GGC4H Trial is looking
at whether an educational intervention, a curriculum
aimed at the parents of early adolescents is associated
with lower health care utilization and lower
behavioral health challenges, outcomes for those adolescents
in subsequent years. So quite a range of trials here. The other evidence-base that I draw from is the work that we are doing alongside a now-completed
cardiovascular outcomes trial, the Harmony Outcomes Trial, which was an event-driven trial looking at the effect of Albiglutide on
major cardiovascular events. We, colleagues of mine and I at Duke had an ancillary study
alongside that trial, partially funded by FDA in
addition to funding from the sponsor GSK, to really look
carefully at real-world data in comparison to the data
generated and collected, as part of the traditional trial. And then the final source of
evidence for my experiences, lessons learned comes
from the Adaptable Trial, which I suspect many of
you will be familiar with, a trial through the PCORnet Network, looking at high-dose
versus low-dose aspirin in, with respect to secondary-prevention of cardiovascular events. You know, we’re especially excited that, just a couple of weeks ago, we enrolled our 15-thousandth participant, which was a… What we were heading for. And we have it, we have a few more since then. So very, very exciting there. So as I reflected back
on this evidence-base and what we’ve learned as
we have done real-world evidence development in the context of the real-world of health care delivery, I just have a few sort
of summary experiences that I’d want to share with you, and the lessons that we’ve learned. The first and probably
the most sobering one, that I think we’ll want
to really keep in mind throughout the discussion
today and tomorrow, is that change is an absolute
constant in the real-world settings in which we’re
aiming to do these studies. So the health care systems, as we know they’re complex, they constantly change, they are not alike. And that had implications for everything, from the way we design the
trial, to how we recruit, to the intervention itself. Certainly through the
collaboratory we’ve seen how often leadership providers, staff turn over, and that really has implications. Again, to the extent
that your intervention relies on continuity at that level. Importantly, and Bob talked
a lot about real-world data, importantly the underlying
data and IT systems, in these health care delivery
systems are not static, and they change, and often change, in response to needs on the
health care delivery side, or needs on the reimbursement side. So again, making sure
that we have an ability to track those changes, to monitor those, and to know, how those changes
might impact the research, the evidence that we’re trying to generate is really important. You know, it can even be difficult and we found this in the collaboratory to maintain a stable control
arm in the real-world. Initiatives, competing initiatives arise, this has been a real issue
for one of our trials that was focused, is
focused on pain-management, and as we all know that
has become a huge issue, the opioid crisis has
brought that to the fore. So imagine doing a multi-year
trial when that comes up, and all of the changes
that go on in health care delivery systems to address
that public health crisis. And it can be the case
that changes in usual care may be unethical to control. So imagine, for example, when a new guideline comes out and really resets what
standard practice should be and what happens when that comes out in the middle of the trial. So these are the kinds of
considerations that we, I would say, from the
work that we’ve done, have been very accustomed
to staying attuned to. Really important for us to
all recognize that the work that we’re talking about and the work that we’ll be talking about over the next day-and-a-half, it really requires a team, and a team of multi-disciplinary experts. Certainly engagement of clinicians, engagement of health system
leadership is critical when we are doing work that is
embedded in care delivery, embedded in care delivery systems, and those partnerships
are not just partnerships with the leadership, or the top-tier, if you will, of those organizations, but it’s really required at all levels, and in fact, I think
many of our colleagues who do these trials would argue having that frontline support for the work that you’re doing may be most important, and that can be especially, that can be especially time
consuming, also rewarding, but that’s a lot of
engagement, if you will, and that can result in the
need for more investment in training and certainly
retraining, as there is turn over. Certainly from Adaptable
we have learned, I think, the very exciting lesson about how valuable participant-engagement is. And those participant
perspectives can really uniquely inform the way we approach
design, recruitment, and implementation. The adapters, the engaged group of participants in the Adaptable Trial, have really had considerable impact at every stage of that trial. And as I know, we’ll hear and talk about, again later today, really engaging our colleagues on, in those ethical and regulatory systems is very important as well. There are critical issues that
arise throughout the trial, and certainly in the design phase, that really warrant discussion early on, certainly for the collaboratory, we have those ethics reviews,
ethics console to end, in some cases, regulatory
discussions very early on, so that we are all aligned and know that the systems are ready for
the trials that we’re doing. And then it probably goes without saying, but I will say it anyway, that when we use existing data, when we rely on existing data and systems, we know why we’re doing it, and there, we believe that
there are some real advantages to that and Bob spoke to some of those, but they will almost certainly add complexity to the work that we’re doing. It’s not just that those systems change and we have to be aware of those changes when we’re using the data
generated by those systems, but integrating study data
elements that we may need into existing workflows, into existing electronic
health record systems, that actually has
implications for workflow, has implications for compliance, and those can be more challenging
to do than we might think. Again, certainly worth
it in many circumstances, but we shouldn’t be
naive about the impact. Oftentimes when we talk
about using real-world data, we recognize that outcomes
may be incomplete, and we often talk about
the ability to link data to create that full outcomes perspective. I would say, in our experience, the technical aspects
associated with linking data, are relatively straight-forward, the governance around that tends
not to be straight-forward. So again, to the extent that complete outcomes requires data-linkage, and more, make sure we are clear about those governance issues going into it. And finally, and a topic
that we’ll dive into, I believe a little bit later today, the data that we need may not be available in a timely manner, and that latency can vary
considerably by site. Moreover, the real-world data that we get may not yet be curated for our uses. So again, that adds to
some of the delay as well. You know, just sort of pulling
all of this together then, as we think about the agenda
and our time together, think, let’s just keep in mind, maybe three high-level points
and come back to these, and as Jacqueline and Bob said, as we dive into the weeds, and not just talk about
the trees in the forest, this dynamic environment
that we are talking about, that we are operating in, the real-world, it has real consequences for
the design implementation and monitoring of trials. Engagement is absolutely essential, and it really can both mitigate risk, and increase the likelihood of success, which is what we want. And finally, the last point that I made, that the reliance on real-world
data has real benefits, we believe, but it does introduce complexity, and we need to be mindful of that. Thank you. (knocks)
(audience applauding) – Great, thanks. Thanks, Lesley and Bob. So we are gonna take a few
minutes for Audience Q-and-A. So if any of you have questions, or comments based on
what Bob, or Lesley said, please feel free to join us. We have microphones there
and microphones there. And I’ll ask you to state
your name and organization, Adrian.
– Adrian Hernandez from Duke, and (clears throat) so Bob, I’m glad you gave a
very optimistic view of randomized trials and real-world evidence. I guess, a few comments
for you to react to. You know, one is you know,
everyday we’re losing clinicians, in terms of research, and in part, it’s because of the burn that they have to deliver in terms of clinical care delivery on
electronic health records, but also the burden, administratively, for filling-out all the different forms, et cetera, for every study. So like the notion of a site
investigator continuing on in this fashion is actually
really problematic, especially when health care is
actually going towards teams of people caring for patients
or actually caring for populations outside of the
health system, remotely. So all the things you talked about, in terms of compliance
makes us really worry, in terms of, how is that gonna fit in to clinical investigators everyday lives. That’s one comment for you to consider. The second thing is that in the U.S., we’ve seen a significant drop-off, in terms of participation research. You know, at best, most
centers have two-percent of participation for
those who are eligible, yet we have to translate that
evidence to the other 98%, and there are vast differences there. And so every time we add something else for someone to come in to do, to make sure that they are fully-compliant with our protocol. While we think that’s
actually adding quality, it may actually limit the generalizability in how we operationalize this. So clinician-engagement, and how can we streamline
things in the real-world? – Well obviously, I don’t know
the answer to those things. That’s one of the things
that is gonna be looked at. It raises the question
of who’s an investigator. I was looking back at
the Gissi Trial, okay. What they did was, if you came into one of the clinics that was part of the study, you either were randomized
streptokinase or placebo. I don’t even know who the
investigator was there, and you can’t tell from reading it. Was it, was the investigator
the whole clinic? It couldn’t have been one person, because one person isn’t there 24/7… I don’t know how they did it, but somehow, the unit or something
functioned as the investigator. Within certain systems in the U.S., I can imagine similar behavior… VA, notably, where you know, everybody’s an employee. Maybe they could all be investigators for the purpose of the study. But I think that’s one of the main questions that you have to answer, how do you be sure that the treatment is given in a reasonable way, who’s responsible, who’s looking, and I think that’s one of the questions that has to be answered here. If you can simplify it enough
so that everybody in the unit is defacto investigator and
I’m not sure, technically, how you do that, or what
they have to sign and stuff, maybe you can do certain kinds of trials, that would otherwise
be extremely difficult. I think that’s one of the
things we have to learn. It’s also obvious that if you don’t have reasonably good compliance
with the system, you’re gonna not be able to show anything. One of the points it seems
to me, as we’re talking is, one question is, does the drug work, and you need a trial that’s
reasonably rigorous to do that. A completely different question is, what is compliance like in the real-world, will people bother to do this? That’s a very important question. But it’s a different question. I mean, one of my major
obsessions is that current figures are that something like
50% of people put on any hypertensives aren’t on
them at the end of the year. How to change that is an
unbelievably important question. It’s not whether the drug works or not, it’s a different question. But knowing how to intervene
in the system to get people to stay on their lipid-lowering drugs, and their, any hypertensives, probably the most important, single health care problem we have. And it needs to be studied
and the only way to study it, probably, is in a real-world environment. So that’s a somewhat
different question from, does the drug work, but very important and very interesting. And that’s what, that’s what the Adaptable one is doing, is how to get the large number of people who have AF and aren’t
being treated right, into treatment. That’s not whether NOACS work, it’s whether you can
get people to use them. So they are, you gotta keep the
different questions in mind, and real-world data for the
compliance issues like that, makes total sense. I don’t know if that quite
answers your question. – [Adrian] Well, it gives
me a little hope because, you’re just thinking about Adaptable. You know, there are 15,000
participants, 40 sites, so the top-enrolling science,
the so-called investigator, enrolled over 1,000,
and 2,000 participants. That actually was a system that did it. On a name, it said, those are the models that
we could move towards. – Sean?
– Yeah, hi. Sean Tunis with Center for Medical Technology Policy in Rubix Health. Always nice to tangle with Bob again, but I got a question for Lesley. So I’m just curious, So, you know, you laid out quite a number
of learnings and challenges. I’m curious, is there
anything that you see, sort of emerging technologically, or organizationally, or
policy-wise that you know, holds promise for making
any of these things, you know, easier going forward, and what are some specific
things that look like, you know, they’re coming
together that might enhance the ability to do the collabora-type work more effectively in the future? – Yeah, no, great question. I see a lot of promise on the horizon, and even in the near-term. Actually the word that I
would go back to I think, is a word that Bob used,
which is hybrid, right. So making sure that we
leverage whether it is, whether it is technologies,
digital platforms, there’s so many different
opportunities we have. And I think the question for
us all is not to do the thing that is, use the thing that is
most interesting, sexy, new, but to make sure that we’re
choosing from the toolbox, those tools that help
us answer the question in the best way that we can. So I think, as we move forward, maybe the biggest learning, maybe I should’ve started
with this is that, hybrid will be the way that
we continue to evolve this, where we pull the best from what is, and not, you know, try to go too far in any one direction. – [Greg] Great. Jesse?
– Yeah, thanks. Jesse Berlin from Johnson & Johnson. So, easy pragmatic
question, not philosophical. For Impact AF and for Adaptable, their cardiovascular outcomes, how are they being assessed,
determined, ascertained? – I’ll start with Adaptable. We’re using a variety of sources to identify cardiovascular events. We use data from the heath systems. So that’s reliance on the real-world data, the electronic heath record data that are in a common format, we use that. We have a Patient Portal, so patient reports events, that we then go look for
evidence from data that we have. We’re pulling in Medicare
claims data, as well, for those events, and really drawing in other, those kinds of sources that we can. On a subset, we are doing an adjudication, a validation of those events. – [Bob] Can I ask? Is Adaptable design is
a non-inferiority study, how’s it done?
– It’s an effectiveness study. And it’s looking at the, it’s a comparison of
high-dose versus low-dose. I don’t, I actually don’t remember. I don’t think it’s not
powered as non-inferior… And I’m looking at our…
(PI murmuring indistinctly) One of our PIs there.
– Yeah, yeah. So I mean, it’s a, detect a difference between
the two doses of aspirin. – Yeah.
– So, suppose you don’t see a difference. How would you know what that means? – If we don’t see a difference? – [Bob] Yeah. You gotta know what the
effect-size would’ve been. You mean, this is
non-inferiority design, right? – [Adrian] It’s what’s… A two-comparison, so–
– Yeah. – [Adrian] It’s not… These are only in terms that
are non-inferior design, it’s not a classic non-inferior design. It’s actually detecting
power to detect a 15 to 20% difference between the two doses. – They need… Could you have more
discussion, I think, anyway. (audience chuckles) – [Jesse] Sorry. And what about for Impact AF? Technically the question
has, really has to do with whether there’s adjudication ’cause like, I know if we tried to do something we’d be asked to adjudicate, so. – Right. So we’re not doing, there’s a validation piece of, but we’re not adjudicating
events in (trails off quietly). – [Jesse] I’ll be back later this morning. – Great, yeah, thank you.
(audience chuckles) Okay, so we are running out
of time for this session. So we’ll go to these
last two quick questions, and over to you.
– Hi. My name’s Mark Luke, I’m
the Director of Therapeutic Performance in the
Office of Generic Drugs. We are specifically interested
in a generic drug adaptation and when a generic drug
comes out in market there’s greater accessibility to
a private, potentially, and how that changes prescribing
habits, and as well as, patient-outcomes of
using that specific drug. Does it change the
patient population of use for that drug as a generic
comes out in market, there are lots of questions, and Bob I wanna thank you
for recently serving on the generic drug research
panel that we had recently, another public meeting, so I know you’re very
interested in this as well. The context of drug-accessibility
and drug pricing, as drugs become more difficult
to access as prices go up, and then the curve as they
become more accessible, as a generic comes out to market… What kind of tools does
real-world evidence allow us to use in that context. I know we have a couple
of on-going studies that we’re looking at
generic drug adaptation, but I’m really interested
in partnering with others who might have similar interests. And so please come and reach out. I’m happy to work with
anybody who might have a similar interest in
this arena, thank you. – Great, thanks for the comment. Okay, last question.
– Yep. Magnus Peterson, Cardiologist,
at (audio distorting). So I will leave my
questions for sake of time, but I’ll say shortly that I’m
very grateful that the FDA and Duke brings up this
very important discussion. And one thought that I
think we should consider is, given that, have two-percent of U.S., physicians have time
for clinical research, we see similar numbers in Scandinavia, and in other regions, how applicable are RCT
results for their populations? Taste is an example where, all patients with ACRC in Scandinavia, 70% were randomized in the trial. So taking pieces from those studies, in the modular approach, I
think, it could greatly enhance the representativeness of
the service beyond the form. And it’s very clear that the more that we could lift directly by
CRFs from investigators, the more burdens on the different sides. So the more they thought that
we could introduce by using screening SCAAR systems
or other data sources, would lower that burden and
hopefully increase interesting clinical resource, thank you.
– Okay, great. Thanks and thanks to Lesley, and Bob for these great presentations. – Thank you.
– Okay. (audience applauding) We’re gonna roll right
into our next session. We’ll be discussing the
selection of appropriate interventions and study
designs for randomized trials, embedded into clinical settings. We’ll have a couple of
opening presentations that’ll provide some
examples of key issues, specifically with intervention
selection that need to be considered upfront
within the study design. These presentations will be
followed by a few reactions and comments from panelists. I would like to note that one
of our panelists, Louis Fiore, unfortunately was unable to make it today, due to a family emergency, but we do have also speakers
joining us remotely, as they were unable to travel as well. So as we get started, I’ll ask the you know, any of the presenters or
panelists in the room, to please join me up on the stage. Yeah, okay. Just, okay thank you. Our first presentation will
be dialing in, Martin Landray, is Deputy Director of Big Data Institute and Director of Health Data
Research at U.K. Oxford, following Martin’s presentation,
Elaine Irving joining us here on the stage is Senior Director and head of real-world study
delivery at GlaxoSmithKline. After those opening presentations, we’ll turn to Iris Goetz who’s dialing in as a medical epidemiologist
global health outcomes at Eli Lilly & Company. And then also a panelist
Steven Piantadosi, Associate Senior Biostatistician Division of Surgical Oncology at the
Brigham and Women’s Hospital. So with that, Martin
if you’re on the line, let us know and I’ll turn it over to you. – [Martin] Yeah, I’m here. Thank you very much.
– Okay, excellent. Thank you. – [Martin] Okay, well I’m very sorry, I can’t be there in person. And it’s going to be difficult ’cause there’s a huge amount of echo. Yes, I’m sorry not to be there in person, but it’s my daughter’s
graduation tomorrow. She’ll be the third generation
of people to graduate from Birmingham U.K. in my family, so I’m looking forward to that. My background as many of you all know, is really in clinical trials, but over the last five or so years I’ve been increasingly
involved in the data world. And I think I want to illustrate to you, and talk over the next 10 to 15 minutes, of how those two worlds come together. So if you go to slide two, clearly what we really
need to do is to provide robust assessments of
health interventions. I think sometimes when, considering it, these trials, I mean, particular, about how sort of our framework. And that starts with a
retribution that most of the treatments we have all
of most of the treatments we will have are not stunningly good. In fact, they have, at
best, moderate effects. And that’s no (audio
distorting) middle and late age, had developed over many
decades have multiple causes and short term treatments
for let’s say, five years, or only one pathway. It’s not likely to have
a substantial impact. And I’m telling you, distinguishing those moderate effects, which can turn out to
be incredibly useful, whether you used in combination,
or a very large numbers of people, distinguishing
those moderate effects, from no effects at all. And really to do that, they require both randomization to avoid bias and scale to avoid the play-of-chance. Next slide. This is all in the context
that of course clinical trials are in something of a crisis. And I realize that it’s not
always considered polite to talk about money so
early in the morning. But I think we do need to because I think it’s a major driver at
scientific constraint. And when one considers
that reaching trials for example, PCSK9 inhibitors
in cardiovascular disease are fed to a cost in excessive
of one-billion dollars, that commercial research
organizations say that over 85% of commercial trials fail
to recruit on time and target. There’s really a temptation
to abandon randomization for the lure of observational methods. In other words, to do the wrong experiment because it’s easier, rather than to do the right experiment in a way that is more feasible. And this is really distorting how treatment development priorities. Early decisions about which
treatments to take forward, moving away from preventive and long-term treatments
for common disease, and instead focus on very expensive, important but drugs that
are rare conditions. And it seems to me… Next slide… That we have an opportunity,
a huge opportunity just now, and they’re trying to get
how do we take advantage of all those technological
advances in healthcare, and engineering, and communications to facilitate those randomized trials, and get good assessments
of efficacy, and safety? Rather than think harder than ever before, more expensive than ever before, actually randomized trials really should be easier than ever before. Data collection and
communication have never been easier than they are at the moment. Next slide. But I think it’s really useful to think of the opportunities in
four particular areas. One is around efficient recruitment, and I’m gonna come back to that. The second is around assessment
of safety and efficacy, and I’m gonna come back to that. And I’m not gonna touch
particularly on the last two which around thinking
about study quality, that certainly starts with protocol design and I agree with some
of the earlier comments, that actually one has to
rethink and go back to basics when it comes back to protocol design. It’s all, it continuously, the issues of software engineering and statistical monitoring, tools that we didn’t have 20 years ago, let alone 40 or 50 years ago, that make even relied
systems (audio distorting). And finally, a huge opportunity
(mumbles) engagement, not only for collecting information, if you like, for our benefit, but actually, people are
making sure that proper information sharing, that
consent is an on-going basis, and that one can actually
communicate emerging information about safety or progress of the trial, all the way through the
trial, and well beyond. To give you just one example of that, in our last trial, because
we had (audio distorting) the names and addresses of the
7,000 surviving participants, we were able to mail the
results directly to those 7,000 participants and all their
doctors within one week of trial-completion and the
announcement of the results. For that, the sort of opportunity we have two modern communications, the point where we drop some of our preconceptions about some of the barriers. Next slide. If we think about the
recruitment pathway specifically, the line in the middle of
the square and the circle, the diagram and the square in the middle, I’ll take us through a typical journey. They give a research question. Think hard about that research question, about protocol design, feasibility, finding the right patients, inviting them, perhaps pre-screening
them, and consenting them. And all of those things of
course could be done on paper, and could be done manually
and typically are, but actually their
opportunities highlighted in where there is (audio distorting)
streamline this process and get huge value from the
tools which are available to us. And I want to just give
one example of that, since focusing on the feasibility piece, ’cause clearly when
one’s scanning a trial, the very first question really
should be (audio distorting) in this research question, do these patients really exist, and are they in sufficient numbers, and where are they… And those sound like simple
questions buts it’s remarkable how often they’re ignored and how often people’s trials fail because they’ve ignored them. So we get to the next slide… This is an example from
an on-going eight-week clinical trial on (audio distorting) and what is placebo which is
only (mumbles) 15,000 people, of whom 12,000 will be
recruited in the U.K., with the remaining 3,000 in the U.S. Everything on this slide
is in the Public Domain. This is a trial that (audio distorting) people who’ve had a problem, not problem, heart attack, stroke or
peripheral vascular disease. And usually a version of
Medicaid (audio distorting) they do a claims data in
this case (audio distorting) diagnostic codes and procedural codes when they must convert
those (audio distorting) criteria (audio distorting)
diagnostic criteria. Now there’s always been a
criteria (audio distorting) but not by chance. But because we’ve spent the
time stripping away the four pages of detailed eligibility
criteria that’s what previously considered to be
medical standards practice. Identifying patients who
are highly likely to get the events, cardiovascular
events which include, preventative new treatments. So taking those codes and providing them to
the National Data System, the equivalent of the NHS’s All England IT Support Department. We are, we’re able to get a massive say to come up with the information
on the next slide. And on the left-hand side you
can see the top 26 hospitals, each had over 20,000 patients
of cardiovascular disease, totalling about half-a-million
patients in total. So cardiovascular disease
is not a rare disease, we all know that. Why do we approach the treatments, a difference going to be a
rare, mad disease problem, and in a manual way. I think the (audio
distorting) stupid data. And on the right-hand side, in
case you can’t recognize it, a map of, at least part of the U.K. This is all English data. The dots represent hospitals. The bigger the dot, the more, the greater the number of patients. And there’s a little bit of
information overlaid on that, based on (audio distorting)
other continuations about whether they were going to perceive. So you can see that it’s a massive stage we’re able to identify hundreds of thousands of potentially
eligible patients. Identifying where in
the country they were, and how they can be recruited. And what we’re able to do, in fact it’s a turn that
information which at this stage is anonymous information
that simply cannot, we’re actually able to
convert that information into real names, and real
addresses, and real people. And then invite them to join this study. On my next slide, I haven’t got data to show
you unfortunately (mumbles) there aren’t four trials, I’ll have to go back to a previous study that we’ve been involved
in (audio distorting) which of course is an
observational co-hort study of half-a-million people. The process is exactly the
same as I’ve just shown you. We wrote to 9,000,000 people, middle-aged people in the U.K., generated from the NHS register. Half-a-million of them said
yes and volunteered to attend 22 centers in three-and-a-half years, and you can see that using
these methods can easily recruit very large numbers of patients
with common positions. Or indeed actually, with rare
or medium rare conditions, hundreds of thousands of people
within (mumbles) disease, tens of thousands people
(audio distorting) with Parkinson’s Disease in the U.K. Which as you might recognize is a little smaller than the U.S. I want to turn from that which at the beginning of this study, I’m hoping I’ve illustrated
(audio distorting) you need to think about
scale as our friend, not as our enemy. And now think about what we
can do in terms of follow-up. And those keys methodological
considerations here, and one of them is that
the enemy of high-quality randomized evidence is lost follow-up and particularly eventual
loss of follow-up. I wanna (audio distorting) really routine data once you get efficient very large scale real-world information captured in a relatively cheap and relatively potentially
non-touch a coach. It could be comprehensive and
it could be durable (mumbles) a lot of information
on efficacy and safety. So during the trial and beyond the trial. I think (audio distorting) already highlighted
some of the weaknesses. They’re not technical,
the technology is easy. The weaknesses are around
acceptability or the records about the information governed to take the security challenges. They’re the political with a small fee, and not the technical considerations. There some issues around accuracy, not all events are all coded. For example, some events
don’t go into hospitals (audio distorting) and then there’s an
issue around confidence. (audio distorting) not all regulated. I’m convinced that these are sufficiently robust for purpose. And the rest of my talk is really focused on trying to consider
some of those weaknesses and persuade you that
those big theoretical and practical evidence where they got this information is robust. Next slide. So turning to the theory. Let’s imagine you do a
trial of 10,000 people (audio distorting) 10,000
people active versus control. And that the real picture,
the true picture is 800 events versus 1,000 events (audio
distorting) actually 22%, and a high and significant P value. A success in anybody’s
books which takes practice (audio distorting)
licenses (audio distorting) but we’re actually involved in the study. Let’s stop and imagine that’s actually, we could actually get the treatment, and that there’s not so much that concept. Next slide. Let’s imagine also now the
fact that the reality is that some of the events
we collect are not really those events that we were interested in, say for example, we included a little bit of
angina in amongst the events which had hoped to be
occurring (audio distorting), or a little bit of (audio distorting) when you really wanted stroke. Or let’s imagine that gets as bad as 20% of the information we capture, we capture an extra 20%
that is not true events. All of course evenly distributed
across the two treatment (audio distorting) remains
basically unchanged, the P value remains unchanged. And the indications, the
practice remain unchanged. Next slide. Conversely let’s imagine
that we miss some events, in an unbiased fashion, this is not in one arm or the other, because we’ve got a
comprehensive collection of data, but we just happened to miss some events, for example ’cause not all
events get into hospital. And here you can see that the
people were to lose say 20% of events, this time unevenly distributed but even the ones you don’t worry about were also affected by treatment. You again have literally
backed on this reduction literally put impact on
the statistical, clinical, and regulated significance. I wanna use that model to really emphasize that these randomized
trials upscale the data needs to be adequate but not perfect. Chasing around after
individual precisions, and individual points
(mumbles) is some mystery tree, it is actually a futile exercise
and simply not necessary. So that’s a theoretical,
let’s turn to the practical, next slide. (audio distorting) about adjudication, this is information from the
reveal (mumbles) which I ran, which is 30,000 patients
under my request alone or not. You can see that we collected
159,000 pages of documentation on 16,000 events that were adjudicated by three adjudicators, 17
assistants over three years. And this by the way is the
streamlined (audio distorting) adjudication (audio distorting) costs. And the results before we started that process I won’t show you. And if you go to the next slide, the results after we finished that detailed adjudication I will show you. And if it’s possible to go back and forth, between those two slides,
you’ll see that there’s frankly no significant difference and
I mean (audio distorting). The significance statistically, clinically are gonna regulate
your other perspectives. Adjudication is not character
but simply costs us money, and it takes to giving us
(audio distorting) smaller study and (audio distorting)
valuable information on over on efficacy or particularly on safety. Now this is not one unique example. If we go to the very next slide, this is another analysis, not mine, this is the Cochrane
Analysis on the effect of adjudication on the estimate
of treatment effect. And all you need to look
at is the bottom line, the bottom dot, which you
can see is (audio distorting) on the effect line. But all of those studies
adjudication made no effect, no impact at all. Now to be clear I’m not saying (audio distorting) adjudicate, there are clearly times
when you want to distinguish (audio distorting) going to be effective
in opposite directions. For example (audio distorting)
in the study of (mumbles) but one needs to be very targeted about when that’s the case. Now that was based on what
happens if you do an ordinary study, that sort of standard
trial (mumbles) study. What I want you to look at
now is what happens if you do a trial using real-world status. And so if you go to the next slide, which is the (mumbles) study
of that aspirin versus placebo and fish oils sort of a
particular dose versus placebo. If you see the results in black, it would sort of optimize on each of these original trial results, if by the way, it was incredibly
streamlined trial in itself, but the adjudication process
is where much of that just described as much as you’d expect. And those are the full results. In the sort of grayer outline, just below are actually what
happens if you switch out all those adjudicated data,
and ignore them completely, and simply use the
English-equivalent of Medicare data. And essentially there is no difference. There’s a very minor difference
on kind of (mumbles) the fat which is one of the components and that’s because (audio distorting) if that doesn’t get submitted to hospital. We emphasized in one of my earlier points. The point is that when
we get robust information simply by using randomization
and this level of follow-up. Next slide.
(man clears throat) And one doesn’t have to limit
oneself to just within the trial period, this is now the
20-year follow-up goal the (audio distorting) study with
Pravastatin versus placebo. It’s actually quite a
small study initially, and you can see that though you can identify that the wrong
benefits are not on the slide but the (audio distorting) or Pravastatin is through the patient. And there’s a lot of
work done so make sure that way you understood, or that those who did the study understood if you got the false-positive and false-negative rates
on individual data points. But individual data points
don’t matter in this context. We’re not trying to
diagnose (audio distorting) in individual patients, it’s not about diagnostic ability, it’s about our robustness
of a treatment effect, and (audio distorting) that
spoke during the trial, for many years later. And that really changes
the clinical interpretation and the cost-effectiveness interpretation of these sort of results. Next slide, and this is,
I think, pretty much, my final slide, par to summing up. This is to take a different setting. Everything I’ve shown you so far, you may say, well that’s wonderful, large scale, cardiovascular trials, ways, oodles of patients (stammers) that’s an English word
for lots of patients. I get oodles of patients
and they got chronic disease and that’s all straight-forward, but what about something more challenging. This is a study of 800 patients
with kidney transplantation. (stammers) A study that was
(audio distorting) at the time someone was transplanted,
which of course is a safety, an emergency (audio
distorting) or acute event. Usually driven by the
availability of the organ. And they know about chronic
(audio distorting) suppression. And all data collection was
driven by routine information. Routine information from hospitalization from our equivalents of Medicare, routine information from
the (mumbles) registry. Effectively, the equivalent
of the United States (mumbles) data systems database. So that’s account (audio distorting) in the U.K. with others
being that registry, and the information from
the NHS (audio distorting) register and they’re the people who actually manage the availability of organs and transplantation
across the U.K. And thought let’s use information
from those when you’re able to identify most
significant improvements (audio distorting) function, where one was expecting one
greater (audio distorting) transplant projection was
against (audio distorting) and greater risks of (audio
distorting) infection, which nobody had spotted before. Just using routine data in what is really quite a rare-ish condition. This is half the patients transplanted in half the centers in the country during the period of (audio
distorting) of about two years. So if you really like (audio
distorting) robust information, even in the context of
relatively rare conditions. So final slide (audio
distorting) we have to reinvent our concepts of randomized
trials for the 21st century, and of course they’re well
into that 21st century, and only just waking up to it. We need the data, the technology
to drive the feasibility. We have to stick to the
principles of randomized trials. And we have to then
modify our approaches to regulation and governance
to ensure that they’ll be able to benefit the patient
care and public health. That is after all what
we’re all in it for. Thank you very much.
– Great. (audience applauding) Thanks Martin, that was
a terrific presentation, we could hear you very well, and congratulations on the
graduation in your family. We’ll turn things over to
Elaine Irving, Senior Director, Head of Real-World Study
Delivery at GlaxoSmithKline. – This the one? Okay. So thank you for the opportunity
to join the workshop today. I’d just like to spend a few
minutes to give you a flavor of when as a sponsor,
we start to consider, despite the challenges
and the complexities that Lesley’s already described, we really think there’s
a value and a place for randomized control trials
in a routine care setting. I’m using an on-going study as an example, just highlight the importance
of really being clear on what the research question
is you’re trying to answer with that trial and keep focused
on that as you go through the complexity and the tough
decisions you need to make, when you’re actually
designing a trial like this. And again, to Lesley’s point, not get carried away with
any new fancy technology but rather leverage that where it’s appropriate to enhance the trial. I’ll then just finish really
briefly touching on a European initiative that I lead called,
The GetReal Initiative, which is focused on driving
the uptake of real-world evidence into decision-making
and my colleague, Iris Goetz, will follow with some more detail on the particular task force within that project, which is very much focused in this space of randomized control trials. So as we, I progress towards, or once the medicine has been licensed and it gets out into the
routine care setting. There’s a number of different
factors that we’re all aware of that can very much influence
how that medicine behaves. Whether that be differences in the benefit perceived by patients
or indeed in the safety. And so at the point that we’re trying at starting to approach the
point of registration, we’re really starting to think ahead about what are the gaps in our
knowledge about how the medicine, what are the factors that will
really impact our medicine as we, as it enters into
the routine care setting. And if we start to think about
the population for example, that have been used in the routine, in the randomized control trials, as we’ve already heard, due to the stringent
inclusion, exclusion criteria, very often the patient population
included in those trials doesn’t fully represent
the patient population that will eventually
receive our medication. The way the intervention perhaps has been given in the trial, again, due to changes in
guidelines or existing guidelines, may not be, or due to blinding
in the trial may not be the way that the intervention is used ultimately in the routine care setting. So it’s important for us to
work through those aspects and understand in the
questions that our stakeholders might have and by stakeholders
I mean, regulators who may need to have
further safety information. It may be health technology
assessors that may need to be really struggling with
trying to identify you know, the value of the new medicine versus what’s already existing. And, or it may be the physician
faced with a patient that has many coping methods that would’ve been excluded from a randomized control trial. And while a lot of these
questions traditionally have been answered using
retrospective studies, for example, for observational studies, I think there’s a real opportunity for randomized control trials
in the routine care setting at this stage and to provide
data that’s as scientifically robust as we can and also
gives us the opportunity to bring that data collection
much earlier in the overall development journey of the medicine. So if we turn to INTREPID, this is, so TRELEGY is a once-daily
triple therapy that is just recently been approved
for the use in patients with moderate to severe COPD, they’re uncontrolled by general therapy. And if you think to what I just said, about the patient population, for example, in COPD it’s well-documented
that probably only approximately of
seven-percent of patients that actually suffer with COPD would be eligible to be included
in clinical trials. So we’ve got to be aware that
when we enter into the routine care setting the patient
population that’s eventually going to receive this medicine
is much broader than that with what we would’ve seen
in the clinical trials. Another major piece here really
is around the intervention. When you think about a double-blind randomized clinical trial
with a therapy, like TRELEGY, where it’s a once-daily administration, single-inhaler therapy, your comparisons are multiple inhalers. So any patient involved in a
clinical trial for TRELEGY, even though they may have been
randomized to the TRELEGY, I will have had to utilize as well, they tell me, inhalers
throughout that study. So any benefit that we think
a patient may have through the use of being able to use
a single inhaler in a day, we can’t really test that in
the randomized control setting. And we know that’s a crucial
factor in the control of exacerbations in COPD. So it seemed very sensible to go ahead and design a randomized
control trial in the routine care setting to test some of these aspects and that sounds very easy when you say it. And then you start to
get into the details, well how do we do that… And I’m glad that Lesley and
others have touched on the use of real-world data to look at feasibility. Because that is also something
we built into this trial but we haven’t actually gone
through it here in detail. But to help identify the
patients to really understand what your patient population is, what their exacerbation history is, and actually that helped
us really narrow down what the countries that
we needed to engage to run the study in. So once we identified those
countries through using the real-world data we then
did some research around, well okay, we want to
look at effectiveness, but what does effectiveness
mean to the stakeholders that will be using the data? So we did some research across some of the health technology
agencies, patients, physicians across the different countries to really understand what endpoints they would want to see
in a trial like this. And it became very clear that
there’s no one-size-fits-all, everybody wants something
slightly different. (chuckles) So again, as a sponsor,
how do you balance that? How do you weigh up all the
opinions of all the different stakeholders that are
going to use this data? We did start to narrow
it down at that stage that it was clear exacerbation, that’s not, you know, unexpected
labor were a key endpoint. They drive health care
utilization so that, and obviously from clinical perspective, the key factor in patient care, but also more towards the
patient reports his outcome measures were obviously
becoming much more favorite, and are included now in
treatment guidelines, certainly for the treatment of COPD. And within sort of the thought
that they randomize it, then when you come to this randomization, non-randomization design, we headed into this journey
thinking we wanted to randomize. But actually then when we
started to again think about, and to Martin’s point here, it was, we’re going to start
thinking about the fact that we have a diverse population, that’s going to be variability. We’re comparing a closed triple medicine with a single inhaler to the same medicine but with multiple inhalers, the effect size was
likely to be quite small. So again, when you start to
think about randomization, the numbers start to
get really, really high. And then you become in a position where it’s not feasible to run the study. So we were faced with, do we work with
exacerbation as an endpoint, and that way we would’ve had
to go to a pre-post design, because we were going to
have to recruit between 30 and 40,000 patients to try to
do it in a randomized fashion. Or do we opt towards the CAT, the Patient Portal outcome
measure where we could randomize and have all that robustness
and control the bias and, but obviously working with,
I guess, a softer endpoint. We then had to consider about
open label versus blinding. Again, obviously gold
standard is to blame, but really, as I mentioned previously, one of the biggest
limitations of the randomized control trials for this type of asset is the fact that they are blinded. And you’re not able to
demonstrate any benefit of that single, daily, administration, which should be a major
benefit to the patient. Obviously as well, we didn’t, I know we talked about compliance earlier but we really didn’t want
to impact patient behavior through the study with
respect to how they receive and go and collect their medication. Not because we didn’t want
them to take it properly, but really so that we could
really tease out the effect of the once-daily versus with one inhaler versus the multiple inhaler
option for the patient. So but obviously because it’s
a randomized control trial, sponsor would oblige to supply the drug, we don’t control what
the comparator arm is, if they have any open triple
therapy that the physician feels is relevant to the patient, so. Logistically with being
probably impossible to supply and for the study anyway. So what we had to do was we
worked with commercial supply so actually the patients
received their prescription, as they would normally
from their physician, and their medicine is dispensed
by their routine pharmacy. And it sold out with behind the scenes and we reimburse for the medication using all preexisting reimbursement channels. We touched on data
quality and consistency. This was where as well we decided, not to utilize real-world data for the primary and secondary endpoints. We do utilize real-world data in this study to look at health care utilization and we’re partnering with the health authorities in the U.K., in Sweden, and in the Netherlands to collect that data essentially. But for the study outcomes,
what we actually decided to do was to use centralized
spirometry and data, and study visits at the beginning, and at the end of the study. And we tried to limit the impact of that on the patient’s behavior
by enabling the patients to be cared for by their
routine care physician, as they would normally in
between those two visits. If we tried to do a longer
term study then that perhaps wouldn’t have been an option, and we would’ve had to look
to use real-world data sources to get at that data. So again, there’s a bit
of a balanced choice to be made there about duration of study versus the means of collecting the data. So anyway I hope that gives
you a little bit of a flavor for some of the challenges and
we’re going to hear much more about a lot of these topics as we go through the next few days. So I just really wanted to
give a message that comparative effectiveness research is hugely valuable for all the
stakeholders involved. But designing these types
of trials is really complex and I’m not sure there’ll
ever be a perfect design. And I think it’s really
important that we have events like this that we have today, so that we can all come together and really think through
some of these challenges, and start to give some
guidance and best practice around how to approach
some of these challenges. As Martin said, we don’t want to take about
cost but this is a significant investment to run a
study for the sponsors, and the patients as well, and the physicians that get involved. And so we need a bit of assurance
that the data that’s going to be collected is actually
going to be of value to the stakeholders that we’re doing these studies for in the first place. So just to sum up, a lot of these challenges we’ll hear and we hear through, as we
go through the next two days, and I just wanted to flag
The GetReal Initiative, which as I mentioned in the beginning, is really focused on driving
the use of real-world evidence and health care decision-making. It’s a multi-stakeholder consortium. We have representation from
the HTAs from regulators. We did a regulator, sorry,
in the previous project, industry and academia. And the way it’s set up is
really to use the GetReal platform and the ability to
bring stakeholders together in a very open forum, a bit
like what we have today, to really try and focus on, what are the key priorities
for the challenges that are in the way of using real-world evidence, but also the opportunities
that we have to use real-world evidence and decision-making. And once they’re identified,
those priorities, to translate those
diving into task forces, which will take on those
challenges and create tangible solutions that will hopefully then, not only provide tools and
then education for individuals to use them but through
the think tank as well to translate those back
through into policy changes and legislation changes as required. And Iris who’s going to
speak next is going to really go into a bit more detail about the pragmatic trial
taskforce that we have, which both tackles some of the statistical challenges we have with
respect to managing bias and switching of medication for example. But also, we’ve created a
tool that helps individuals walk through some of these
very complex design challenges that we’re all discussing today. So, thank you.
– Great. Thanks, Elaine.
(audience applauding) (audience clapping downs out
Greg murmuring to Elaine) So, Iris, if you’re on let
us know we’ll turn to you, from Eli Lilly, for the, I guess you’ll be kicking off the reactions to the opening presentation. So Iris, are you on? – [Iris] I’m on, if you can hear me. – Excellent, yes we can. Go ahead. – [Iris] Do you put my slides on? Because I don’t know when
you’re ready with the slides. – Yeah, they’re, your title slide is on. And I think we’ve got
somebody that’s advancing them when you say to.
– Okay. All right, very good. So thank you very much, Elaine, for giving this introduction
and just to clarify, she said the lack in time,
I’m not gonna kind of proper the entire pragmatic trial
work package that I would, I was and am co-leading with
my colleague Mira Zudigeest from the University of
Utrecht in the Netherlands. I’m just gonna concentrate on
one of the things we’ve done, which is the creation of a tool
that helps trial designers. True, they’re kind of made of thinking through a pragmatic trial, but if you have further questions regarding the work package itself and all the other elements
we’re covering there, I encourage you to contact
me or Mira Zudigeest, and we’re very happy to
accept separate calls, if we think that’s of interest. So in terms of the tool
which we have called, PragMagic tool, are to
get a real work package, turn out the next slide, please. We have identified that the majority of, they called it that
Elaine mentioned before, these health care providers
or patients, payers, and increasingly also
regulators if we can speak today with the workshop (mumbles)
are interested in looking into generating real-world
evidence data earlier of good quality in their drug development. Throughout the talk today we
call them pragmatic trials. I know that you know,
a controversial term, but I just make the assumption
that we kind of are thinking about the same design in
terms of randomized trials in a real-world evidence setting. So, the data that we’re interested in, it’s real-world evidence
data in the early stages of the drug development
but it turns out that, if it is not their standard
study design that has been used over the last decade that the experience of both people doing these trials as well as those looking at these
trials is very limited. So particularly trial is
often face very unexpected challenges both in the planning phase and the setting up phase of these trials that are concerning the methodology and feasibility of pragmatic trials. We have performed this part of the, I like GetReal initiative, a survey under a targeting trial that indicated that up
to 82% of the respondents who were involved in either
the planning or come back of the pragmatic trials
experienced major challenges. Both in their planning and the conduct of the evaluation of pragmatic trials. And this led to the
abandoning of the planned pragmatic trials in almost
a third of the cases. What is also important to
emphasize is that various stakeholders as well struggle
with acceptability of the data which makes it sometimes
a kind of vicious circle that people sent on to do a
trial where they don’t know the data are acceptable. Send out the next slide, please. For their PragMagic tool
as you see on the slide, also a link that you can use to have a further look, retrospective at all. This is a tool that we created
over the last three years to help people in their kind of preparing for these challenges. And it is important to
emphasize here this is not kind of one time access and
you get a ready trial out, this is much more a discussion tool or a decision support tool that
kind of helps you as a team, to go through all the
challenges that you may face. The next slide, please. So what can this tool do for you? It’s a online tool that asks you, it’s free for access, you just need to register
with a Email address, that is not used otherwise. What it does, it helps you to facilitate and to consider the planning
of pragmatic trials. It can help you to inform
in, as part of a teamwork, where pragmatic choices could be made. If you’re, for example,
deciding within a more conventional RCT or a
more pragmatic trial, it kind of, can give
you a hint what it means to be more pragmatic or less pragmatic. It can also make you aware of the possible operational challenges that you may face. Obviously it’s a generic tool
so we’re not giving specific information that are connected
to specific disease areas but it’s the generic awareness
of operational challenges that you could be facing. We would like a part of this
tool to help you to assess what the expected generalizability
of the result of this trial could be because most
of the research questions you are trying to tackle when
doing a pragmatic trial is concerning data that’s
generalizable to the maximum extent. It can also and that’s almost
kind of a side kick used to be a educational, it can be used
for educational purposes, so trial is to who haven’t been
dealing with pragmatic trial can use it as a learning
tool to understand what are the differences
between a pragmatic trial and a classic RCT and
when you can make trials into something that is more pragmatic compared to a conventional trial. Next slide, please. So this picture is just a screen sample to show you what you could potentially expect when you register for this tool. On the upper-right-hand
corner there is a dial that is trying to make choices
between six as the main, the mains are generalizability,
bias, precision, stakeholders, cost, and
duration of a trial. And it shows you in these
kind of implications what, in each domain of
participants outcome compared to data safety, what the
issues could be evolving when you do a pragmatic trial. So here is the example
we use participants, in participants you go, you select as the main as you were, and you’re gonna see several
questions in this domain, for example, what the, and the
generalizability criteria are for the participants, whether
you involved vulnerable patients or not and so on. So it gives you several questions, and you answer all of them, guide you through the implications you have on generalizability, bias, precision, stakeholders, cost of duration. And it kind of forms a grid that is using a color scheme almost
like a traffic light. So something that is green is kind of on the more pragmatic side. Something that is red is
more on the classic RCT side. What we are not trying to do at all is to judge whether a decision
is better or worse, we’re not trying to encourage
a trial to be more pragmatic or not, it depends on
the research question, whether you choose to be
more pragmatic or not. It’s just trying to help you think, to think through the
questions that are important when setting up the trial. Can I have the next slide, please? So this is just our screen sample of possible outputs you could see. We can develop a finger print, on the right side that
just gives the kind of color-coded overview of
what the decisions you make for these specific trials have in terms of impact on the different domains. On the left side you see another
possibility which is simply an Excel spreadsheet
that you could download and you can work on it in
a team to kind of improve certain domains, if you
can change them afterwards. So there’s different ways of using the tool in terms of how to. So next slide please. Just to recap, so the aim of this specific PragMagic tool is to facilitate the design and planning of trials that are more
on the pragmatic side. And the way we do it is
trying to get insight into the impact of it, the consequences
of the various design choices and the operational challenges
on specific, in domains. The aim is really to ensure
that the data that you generate with this trial have
maximum generalizability and while we try to ensure the validity and the feasibility of the trial design. The focus of this specific tool is on randomized pragmatic
trials with a drug component. Obviously a lot of the
support is also true for other interventions like devices or health care interventions
that are not drug-related, but we kind of concentrate
this on the drug component. What it is not: it’s not making the decision
for a trial if it’s just supporting a thought-process
to make a decision, it’s also not a checklist to
ensure compliance that it’s obviously irrelevent in
an interventional trial. And it’s also not a
quality tool that kind of, picks you up on the quality side. On the next slide, please. It just shows you the link once more and also the Email address of
my colleague, Mira Zudigeest. And we would be very
happy to inform you more, if you’re interested
in that specific tool, but also other things we’ve
done as part of GetReal and we’re very keen also to get feedback. This is the first version of the tool. So there’s opportune room
for a lot of improvement and we would be very keen to hear from anyone who’s interested. – Okay, great. Thank you, Iris.
(everyone applauding) So we’re gonna go to our last one, we are running out of
time for this session, that’s okay with… You get a few minutes for your comments and then we’ll go to… We’ll still have time for a
few comments and questions, I got a list of questions on my side, I’ll probably skip those
and turn to the audience if there are folks that
wanna line up at the mics, but go ahead Steven.
– Thank you. Is it okay to speak from here, or? – Yes it is, it’s fine, yeah.
– Sure. I don’t think these are my slides. There we go.
(Greg murmuring quietly) Okay, thank you. Thanks for the opportunity. I want to describe a couple
of tools that are being developed and coming online
that I think might be quite useful in solving
some of the problems that have been alluded to by previous speakers. And except for a name that I
inherited that I’ll describe in a moment, I won’t
use the term real-world, Bob Temple has told us
that it’s ill-defined. I actually think it’s
worse than ill-defined, but I will use the term point of care. And this is sort of a paradigm
for what takes place in a conventional clinical trial,
randomized trial if you will, where we take individuals
from the point of care, pass through several filters
that include eligibility, and exclusions, access to care, and so on. And using those filters we
define a trial population. Then in that trial population, there’s a data model that’s
either explicit or implicit, that helps us to define
our care report forms. These need human curation, those case report forms then
form the database for our trial and eventually through some
analyses and other imaginations we get our trial results. It’s important to recognize
that there’s no set of filters that one can place
between the point of care, and the trial population that
will expand the population, that will make it more
like the real-world, if that’s your preferred term. These are filters they
reduce the population, and for example, we heard
earlier in the Adaptable Trial, the fabulous numbers in
terms of sample size, but if I understood the slide correctly, it was still one out of 40 who were approached for entry into the trial. There’s plenty of
opportunity in that kind of a filter for various selection
biases that plague us. We also know that this
mechanism works extremely well, despite the limitations
that we can all see. We have scads of new
drugs being developed, its been serviceable for 75 or 80 years, its advanced therapeutics, its been amenable to testing
many important questions, eliminating losers and
selecting good therapies. However there are important
questions that this kind of paradigm doesn’t,
would work very well with. So here’s an alternative, and in the alternative we’re
going to take our data model and place it inside the point of care, and we’re also going to
take the other components that we need to add structure
inside the point of care so that we can eliminate the pathway that goes through the filter. And by converting the EHR in
particular into structured computable data we then have
a way to produce trial results directly without going through the filter. Now we still have issues around consent and some other problems but I want you to be able to see what the two tools that I’m going to describe very
briefly are intended to do. The first one is called, Minimal Common Oncology Data Element, this is the most
highly-developed at the moment. It’s in version point-nine. It’s a collaboration
with some major entities, I’ll describe in a moment. And it’s an attempt to
describe that data model not in the CRFs but in the
electronic heath record. The second tool was called, ICARE. This is where I’m left
over with the integrating clinical trials and
real-world evidence name, but these are questions
that attempt to capture data at the point of care
that is otherwise rather ambiguously placed or recorded in the EHR. And I’ll show you two
such data elements that we are intending to clean up
with tests of ICARE data. Here is the mCODE, this doesn’t look very minimal, but it is.
(audience laughing) This is the result of a group
discussion with about 60 to 70 data elements that
come from a single use case. There may be, will be
additional use cases that are developed and so the
minimal set will expand, but hopefully not too dramatically. And you can see on here there are elements of various components,
patient characteristics of care-rendered genomics and
so on in the cancer context. And so this mCODE set, we expect will be widely adopted as a data model
going forward in oncology, and hopefully will live a life of its own. It’s all in the Public Domain, you can Google this up and read about it. The ICARE mechanism is a way to sharpen up or create structured data inside the EHR. Here are two examples of
ICARE questions around data elements that we think
are documented to be very poorly captured and
structured in the EHR, particularly in the physician note. One is around disease status and cancer. It basically creates a structured
answer to a sharp question about disease status and then
this structured data can be plucked out of the EHR using
very simple technology, much less sophisticated than
natural language processing, and then treatment change, another point that’s poorly captured. Has the treatment changed with this visit, if so, what’s the basis for? And you can see some of the structure that’s offered there around that. At the bottom of this
slide you can see some of the collaborators
that we’ve kept informed and have given us feedback on the creation of those questions. So time is short but thank you
for the opportunity to tell you a little bit about those tools, and I’d be happy to speak
with people individually, if there are additional
questions or comments. – Yeah, great. Thanks, Steven.
(audience applauding) Okay, we’ll take a few
more minutes for questions and comments from the audience. I’ll go to Naomi at the microphone. – [Naomi] Thank you. My question is on adjudication. And that was very interesting, but I’m wondering if that
idea that adjudication doesn’t make a difference, applies
to a setting we often see in evidence that health
plans review, which is… I’m taking, for an example say, an orthopedic procedure where one part, one of the outcomes,
maybe it’s a composite, is you know, a re-operation
maybe for pain or for function, and it’s the treating surgeon who actually decides on that necessity, and when we ask about that, we’re told, “well that’s the treating surgeon”, so this is a real-world problem. And yet it seems to me that adjudication would make a difference
in those situations, because there is no blinding–
– Yeah. – [Naomi] to the intervention. I’d be interested in your thoughts. – Yeah and then you’re, so just for the record, Naomi Erinson, the Blue
Cross Blue Shield tech? – Yes.
– Okay. – [Naomi] We’re now Evidence Street. – Okay.
– But yes. – So good question though. And around a necessity of
adjudication and impact. I think we still have Martin–
– Yep. – and Iris still on the line, as well. Anybody here that wanna sorta tackle that? (audience chuckles)
They’re deferring to the two that are online, or on the phone. Martin or Iris, do you wanna? And I think you brought
this up too, Martin. So any comments on this question? – [Martin] Yeah, I’ll
have to get back to that. I mean, I think that, what
we’ll have to think about is what are the sources of error, and are they likely to
be biased if the effects to the treatment’s allocation. Now if these treatments is open label, under distinction about
operational, not operate, with some specific operations for example, is made by somebody who’s aware
of the truth of allocation. But I know it’s the potential, but to do bad in those circumstances. Whether one adjudicates that, or not actually won’t get rid of the bias. The bias will still exist. There’s a different issue, which is in the context,
where the treatment is, the treatment allocation is concealed and continues to be concealed. And if the person that’s usually about doing this particular operation, but don’t know which treatments
are on the person’s end. (phone line beeps)
And in both circumstances, even if you had some
operators who are more likely, for example, to pay
something with a particular, more specialist, or perhaps
effective (mumbles) code, than others would, that
will still just be noise. And then if it’s just noise, the question is, is your trial going to be large enough to overcome that noise. And that sort of one needs
a particular perspectively or make sure that the
answer ends up being yes, your trial will be laughing
after they overcome that noise. So the tension between
scale and precision. It’s not, and that potential
adjudication versus not scale which addresses adjudication
did not address issues of bias, with one treatment or the other. I hope I put that right and if I hadn’t, I’m sure Bob Temple will
put it right for me. – Okay (stammers).
– Thank you. – All right, thank you. Question over here? – [Dan] Dan. (mic cracks)
I may have broken this. (Lesley chuckles)
Okay, Dan Connors from the Biomarkers Consortium at FNIH. So I have a couple questions
and I’m not sure whether to– – Maybe prioritize the
(chuckles nervously) one or– – [Dan] Okay, thank you, thank you. So the first one, I was thinking
back to one of the first presenters, they talked
about the Adaptable Trial and I noted that the
intervention was intended for, within 24 hours of the event
and it made me think about how incredibly inaccessible my EHR is and how someone would be
able to identify an event in my experience within 24
hours and it get me on trial. And so maybe I’m misunderstanding
but that seems like a logistical issue that
would need to be addressed and so I was wondering
how we would establish that searching in EHR
is one thing, right… Making, pulling information is one thing, but pushing that information to a trial is just something else entirely. And that made me think about
a second question which is, should we be discussing
post-market trials as a way to give ourselves the opportunity
to explore a larger patient set but with more
time in order to do that? And then the third question
or point is to Dr. Piantadosi, I really like the idea of
being able to pull out those filters early on but does
that not just then kick those filters down the road
to later establishment once they’re actually on trial? – You wanna take the latter one? I don’t know.
– Sure I think the answer to that is pretty straight-forward. There are filters such
as consent that would need to be applied after the fact, even if we induce the
electronic heath record to contain essentially
perfectly structured data like a case report form. However there are types of
research that could support the results of the clinical
trial that could be done without consent such as
using anonymized records. And there you would have the
confidence that the methods of data capture and quality of the data in the EHR was identical
for the trial participants, as for those who didn’t participate. So I do think that the ICARE
method would help to solve some very important questions. Thank you for that question. – [Greg] Okay, next comment over here. – [Rahoul] Hi, Rahoul
Shanonagerashershonaka. I’ve got couple of questions for Elaine. I want to ask if there’s a plan to do the INTREPID Trial in the U.S. setting? So that’s for question, number one. And do you think it’s easier
or more feasible to do a trial like INTREPID in European countries because of maybe (mumbles) fragmented health care system compared to the U.S.? – Yeah, that’s a good question. – Yeah, sure so that was
a very good question. So we do actually have a sister trial to INTREPID that is running in the U.S. It is a different design, we went for the pre-post
option in the U.S. And that was driven by
the feedback that we had, the exacerbation was really the critical, the critical endpoint for the U.S. So I think that’s, you
know for me it’s really designing the studies to the questions that you’re trying to answer. And to your question
about whether it’s easier, I think that study, actually, is being run in partnership with two of the major health plans in the U.S. And so the switching of the medication is the only intervention for that study, and then the data for
that study is being pulled entirely from the claims databases, apart from some safety,
additional safety information. But the identification of the patients and the recruitment of the size, that’s all being done in
partnership with the health plans. So I think from that perspective, you can run these studies
in the U.S., as well. I think in the E.U.,
we’ve struggled a little. We’ve already heard from
Lesley that you know, accessing data regardless of which country you’re going to is challenging. So if you want to run a
real-world study where you want to access real-world
data, at the moment, we found that we can do that in the U.K. We can do that in the Netherlands. We can do it in Sweden. But there’s really not that
many countries that is possible to do it in, at the moment.
– Okay, thank you. Well, since we’re over time already, I’m just gonna piggyback
(Elaine laughs) one additional question
onto that, specifically, ’cause it was around the INTREPID study. And you know one of the things
that you mentioned in there was sort of like the idea of balancing the patient out-of-pocket cost
from the intervention drug, which you could supply
versus the combination of all of the different usual care things that the patients could be taking. That might also be a more
complicating factor in the U.S. Like, how you know, what are… Do you work with a payer, do you work with a distributor, do you work with the dispensing pharmacy, does the patient have like, they’re instructed not to use
their normal insurance card, do they use a special card, like how did you guys work that out? – Yeah, so actually in
the U.S., (stammers) is I think, the more
straight-forward solution. So in the E.U., every country has a different solution, ’cause we had to adapt to
the different reimbursement channels that we had in each country. In the U.S., what we’ve actually done is employ a voucher scheme.
– True. – That the patients, so the patients go to their physicians as normal, we get a voucher given to them, which I believe is a little credit card, and they take that to
their normal pharmacy, and then that triggers the
reimbursement of the medication, again, through the usual channels. – Okay, excellent. Okay, wonderful panel. Thanks to everyone for
being on the session, and certainly those of
you who have dialed-in. We’re gonna go ahead and take a break, we’re about 10 minutes over. So what I’m gonna do is say, take a break now, I’ll still give you 15 minutes because you know, everybody likes that. (folks chuckling)
And then we’ll come back at 11:25 and we’ll go from 11:25 to 12:30 for this next session. Thank you.
(everyone applauding) (crowd chattering) Okay, welcome back. In this next session we’re
gonna hear about the challenges and opportunities for
reliably capturing outcomes in the clinical practice setting using real-world data sources. Building on our last session, we’ll dive deeper into how
these underlying data sources are critical for designing
and implementing the trial. And some of the unique
considerations entailed with using real-world data for
ascertaining study outcomes. I’ll introduce our lists
of panelists and speakers. Elizabeth Sugar is Associate Scientist, Department of Biostatistics at Bloomberg School of Public Health at Johns Hopkins University. Sean Tunis is Principal at Rubix Health, and Founder and Senior
Strategic Advisor for CMTP, Center for Medical Technology Policy. Joining us on, via the phone,
who is unable to travel with, to meet, be with us today is Atul Butte, Priscilla Chan and Mark
Zuckerberg Distinguished Professor and Director of the Bakar
Computational Health Sciences Institute, University of
California, San Francisco, and Chief Data Scientist at
University of California. David Madigan is Professor of Statistics at Columbia University. Bill Crown is Chief Scientific
Officer at OptumLabs. And Cathy Critchlow is Vice President, Center for Observational
Research at Amgen. With that, I’ll turn things over to Elizabeth for her opening presentation. – Okay, good morning. Just waiting for the slides. – Just go next.
– Okay. (chuckles) So I’m going to give you a
bit of a case study of using real-world data in an
actual clinical trial. This is the RELIANCE trial
which is comparing Roflumilast versus Azithromycin for therapy to prevent COPD exacerbations. And just to start I’m gonna
give you a brief background on the actual trial itself. It is a PCORI-funded multi-center, randomized, parallel,
non-inferiority trial. We are, as I mentioned, comparing RofLumilast,
which is FDA approved for treatment in COPD with Azithromycin. We plan to enroll 3,200
participants with COPD that were hospitalized
within the last 12 months. The follow-up will be at base line, which will actually be
our only in-person visit, and then at, actually there’s
gonna be a one-week touch-up just to make sure they
got their prescriptions, then three months, six months, and every six months thereafter until they either have an event, or until the end of the trial occurs. Our primary outcome is all cause re-hospitalization or death. The secondary outcomes are
medication adherence, cross-over, treatment discontinuation,
emergency department, or urgent care use, a bit
less-severe than hospitalization. The NIH-PROMIS measurements,
some PRO outcomes, out of pocket expenses,
since some of these drugs are quite expensive,
as well as weight. We are also of course
interested in the adverse events that are know to occur with these two treatments, mainly,
hearing decrement, diarrhea, nausea, and suicidal ideation. Now you’ll notice I’ve
highlighted some of the different outcome measurements that
we have here in bold, and some of them not, and that’s sort of a little
hint coming up of what we will actually be using our
real-world information on. The highlighted ones we can use it, the non-highlighted ones we can’t. So the trial actually has a
large number of different data sources that all go into making
up the RELIANCE database. As I mentioned, the only
in-person visit will be that first base line visit, all of the
follow-up visits will be remote. And we’ll be collecting
data either through a Patient Portal, through
an Investigator Portal, where an investigator
might look into you know, they actually had the
patient in the clinic, and they find out about a hospitalization, and also a Call Center Portal. You know, these are COPD patients, they’re older, they may
be less-technology-savvy, and so they may not be comfortable
completing their visits, their forms, everything online, and so we will have also a
call center that will call them at regular intervals to make
sure that we collect that data. Now we will be supplementing
this by external data, and some of that will
be EMR from the site, that the coordinators at the
sites will actually review the EMR to see if there are
any reported hospitalizations. We do realize this has the
issue that if they go to an EMR that is not at the same hospital
that they’re being treated at that would be out-of-system
and it could be missed. We will also be using the
National Death Index to try to monitor any deaths
that were not reported. Although we are collecting
extensive contact information from these patients. Now the National Death Index
has a problem that we’ll also discuss when we talk about
the CMS Administrative Portal and that is a delay in
reporting between the timing. And so that may mean we can only use that information for part of
the follow-up period. Now what we are very excited about and what I wanna sort of focus on a little bit at the end here is the FDA’s Sentinel Collaboration. And we really had two goals here, one for us, one for them. You know, share and share alike, so we want to validate and
supplement the trial data. And so we are going to use their
databases mostly in-patient claims files, enrollment and death files. And as well as the prescription
drug dispensing trials to really help validate and
see whether or not we are capturing all of the information
for our primary outcome, as well as information
about Medicare adherence, urgent care visits,
emergency department visits. And so that is data we think
we can really capture well in the subset of the population that we’ll have Medicare information. On the other side we
are going to help them with their proof of concept
of the Distributed Regression models to see if you could
run these trials remotely, just using those database systems. So we are gonna get two
downloads of the Medicare data, the first linkage will be in 2022, which we have roughly lined-up
with our interim analysis. For that data we will be collecting the annual 2019 and 2020 data, and I mentioned before
all of the components. The second one we will complete the annual information that we
have for 2021 and then, gain the quarterly access
for the first three quarters. And I’ll talk a little
bit about the difference. Now the benefit is off
obviously that we have a check, a validation of our primary outcome, we have a lot of additional
information on adherence which is always something
that’s hard to capture, although with the caveat
that this will be whether or not they filled their prescriptions, not necessarily whether
or not they took them, and then the proof of
concept, as I mentioned. Now there are a number of challenges here. We have annual and quarterly
files that we can gather. The annual has the advantage
that is 99% complete. We can get the drug information from that. But it has a 13 to 15 month lag, so we have to wait a great deal of time for that information. The second, the quarterly
comes in much quicker. It still have very good completeness, but it doesn’t have part D and so, for the end portion of our trial, we’ll be relying on the
quarterly information, as opposed to the more complete annual, and so that will limit
what we can validate. Secondly we could only validate
for the subset who are a part of Medicare and that
will only be a fraction of our population, so we’ll have
to think about you know, measurement error, other
techniques that we can use to try to extrapolate this
to the whole population. And then finally, as
I mentioned, you know, we can only use this for a
limited subset out outcomes. So, thank you very much.
(audience applauding) – [Greg] Great, thank you, Elizabeth. We’ll turn things over to Sean Tunis. – Great. Yeah, so. As usual for people who’ve
seen me talk before, I’m going to be making
an overly zealous case, just to try to be provocative.
(audience chuckles) But and I’m most interested
in provoking Bob Temple, (audience laughs)
but others are welcome. So I’m in my… Well, let me just get (stammers)… Here is the take-home messages, in case you have Email to do.
(audience chuckles) So measuring outcomes that
matter most to decision-makers, patients, clinicians,
regulars, payer center, has emerged as a primary
focus for evidence generation, right, patient-focus, drug development, patient-centered outcomes,
that’s all the rage, right? What matters to patients,
that’s what you gotta measure. No matter what type of
study we’re designing, we need to start with
a clear understanding of which outcomes matter most. If we get the outcomes wrong, there’s no point in doing anything else. That’s what I would argue. So once we know which are
the most meaningful outcomes, we can figure out which RWD elements are valid reflections of those outcomes, or figure out how we’re gonna add missing, but meaningful outcomes data. But you know, as complicated as all this, if we don’t get the outcomes
that actually matter to decision-makers and measure
them consistently and reliably, then everything else is
wasted time and energy. So and going back what we used to call pragmatic clinical
trials, or I called them, Practical Clinical Trials, ’cause I forgot to read the
Schwartz and Lellouch paper when I was writing my paper. So, Peter Tugwell, I like his phrase, “Clinical trials are only as
credible as their outcomes”, and our JAMA paper in 2003, measure, about practical trials, “measure outcomes of greatest
relevance to decision makers”, the PCORI methodology committee, “identify and include outcomes “the population of interest
notices and cares about.” And Califf and Sugarman, and their Clinical Trials
paper focus on outcomes, “directly relevant to
participants, funders”. So everybody agrees the
outcomes have to be meaningful. And the fear, and the
challenge, and the risk I think is to prioritize
feasibility over relevance. And if we do that I think the whole enterprise is bound to fail. So here’s probably the
most important slide of, and that’s the other thing is, so what about you know, value where, everything’s about value-based pricing, value-based health care, value-based… Like, value is gonna you know,
save the health care system. If we don’t do value we’re something. But–
(everyone laughs) So and what’s value? Value is health outcomes
achieved per dollar spent. So again, if we don’t get the
right outcomes we got nothing. We don’t have, you know,
we don’t have value, we don’t have value-based pricing, we don’t have value-based anything. And then Michael Porter says, “Health outcomes are
inherently condition specific “and multiple dimensional”, meaning the most meaningful
outcomes for diabetes aren’t the same as COPD aren’t
the same for hypertension, and there’s not usually one-single outcome that’s most important. Okay so this is a slide you
need to pay attention to. So this is a Consumer Reports Table for electric screwdrivers. So there’s a bunch of brands
in the left-hand column, and then on the top are, you could say, the performance measures, or we’ll call ’em you know, the equivalent of outcome measures, right. So every Consumer Reports Table, they do a bunch of focus
groups and they say, “Well what matters to you when you wanna buy an electric screwdriver?” And I want it to be fast, I want it to be powerful, I want it to run for a
long time when I charge, I want it to charge fast. And they come up with about
a half-dozen of things that consumers care most about, right, and then they measure
those in the standard way. So you start with, what does everybody agree, I mean, people care about
lots of other things, but you know, you need a kind
of limited list of things to make, you know, that
matter to most people, right. So and then you have to
consistently measure all those things so that if someone wants
to buy one of these things, they know, okay, well you know, I want the best balance
of power and charge time, or I really want a powerful tool, I don’t care how long it takes to charge. But you know, people couldn’t decide that. If Consumer Reports
published tables like this, nobody would buy this magazine. Right?
(audience chuckles) Well, it’s like, okay we know
the speed for three of these, and we know the power for two of these, and we know the charge
time for three of these. That is worthless.
– Yeah. – [Sean] Right? You just can’t make any kind of a value-based comparative decision. And we have to remember
if obviously this is, I don’t know if people notice, this is an analogy to clinical research. (everyone laughs) So, but that’s what, you know, that’s what evidence
tables usually look like. Right, systematic reviews,
that’s what they look like. That’s what we do. So, you know, any… So, you have to remember
that any individual study you’re doing, people aren’t making a decision based on that study, they’re basing these decisions
on lots and lots of studies, and if every study is measuring different outcomes different ways, then you know, you basically
have nothing worth while, or, well that’s a little extreme, but let’s leave it at that. You can quote me on that.
(audience chuckles) It’s less useful than it could be. So there actually is a solution to this. There is something that
people have been working on all over the work called
Core Outcome Sets, “An agreed standardized set of outcomes “that should be measured and reported, “as a minimum, in all clinical research “in specific areas of
health or health care”. Go on the COMET Initiative website, and they have a database of
hundreds of Core Outcome Sets that have been developed for many different therapeutic conditions. So the work has been done
and these are developed with patients, and clinicians,
and payers, and regulators, so that you have everybody’s input. And then you know, we’ve done some of this work at Sanford Medical Technology Policy, we partnered with National
Hemophilia Foundation, McMaster, to come up with a core outcome set for gene therapy in hemophilia. It’s a area of active drug
development as you know. So we got you know, a
group of 65 stakeholders from U.S. payers, international payers, regulators, patient advocates,
industry, et cetera, et cetera, et cetera… We went through this three-stage kind of, nine-month structured
modified Delphi-process, and you know, one of the
outcomes with you know, no surprise, frequency of bleeds, you know, all of the stakeholders, you know, rated that seven
to nine as essential. There was some variability
on the mental health, anxiety/depression outcome, but the patients all
rated this seven to nine. So you know, so this was one
of the six core outcomes, out of 90 that we started
with that all the stakeholders agreed you know, should be
measured in every trial. And I would say, you’ve gotta start there and say, that’s once when we start
to approach figuring out whether we can do a real-world
study that’s gonna be of any value I don’t think it makes
sense to forget about the critical outcomes that
everybody agrees are important. And then we start dealing with
the issues of feasibility. So no matter what type
of study we’re designing, start with a clear understanding of which outcomes matter most, and suggest caution in moving too far from outcome relevance and
service of feasibility. Thanks.
(audience applauding) – [Greg] Great thanks, Sean. So our next speaker is
joining us from the phone. Atul, can you, are you with us? Can you let us know if you’re there? You might be on mute. I’m gonna say, you could
try to unmute your phone, and see if that works. If not, maybe we’ll go
to the (trails off). – I’m here!
– Okay, there he is. – Can you hear me?
– I was almost gonna go to the next person–
– Can you hear me? Okay, great!
– Yeah, no we can hear you, perfectly, thank you! – [Atul] Excellent. And I’m gonna guess my slides are up? – [Greg] Yes they are! Your title slide is up.
– Okay, great! So, I guess our commission from
Sean to be both provocative and present with zeal. So I’m gonna take a different angle and talk about real-world
data from the clinical side. And just maybe talk a
little bit about potential, what we’re doing. So if you go to the next slide, I wanna reintroduce the
University of California to you, and by the way, thank you for letting me
participate remotely here. Just too much travel nowadays. But I’m representing
University of California here, and if you haven’t realized the scope of the University of California, we got 10 campuses, three national labs, 200,000 employees, so we’re
actually one of the larger employers to the United States, and that’ll come up in a moment. And a quarter-million students per year. We have 18 health professional schools, which include six medical schools. So you can see all the funding, 12-plus billion in clinical revenue, 5,000 doctors get a paycheck from us, but 100,000 doctors write orders
on our patients every year, and of course we’re not
just random centers, we tend to think of
ourselves as doing well, compared to others, let’s say, with these lists and top tens. So what we decided
two-and-a-half years ago, at least publicly announce was that the entirety of the
University of California, is aspirationally gonna make a single accountable care organization. Aspirationally means
within five to 10 years. And that will be called UC Health. And the minute you make
a decision to that, you’re gonna need to put all that data, clinical data in one place to both support and look at the cost of
the care we’re delivering and even more importantly, the variation in care practices. So if you go to the next slide, you can just see the
usual kind of, IT side, I wouldn’t be an IT
person if I couldn’t make boxes point to boxes, there
is the one slide like that. And you can see, these are
the six medical schools. Five are major medical centers so, UCSF, UCLA, Irvine, Davis, San Diego, and then Riverside is our
newest medical school there, extremely high-need,
couple thousand patients. But it’s listed there just
like all the rest there. And all the clinical data is in one place. Now, lucky for us we’re all on Epic, for better or for worse, it wasn’t always like that but
we do have that one benefit, and we are now moving all
that data into one place. And here, I’ll talk about
what we could do with it. So, if you could go to the next slide. You could see the kinds of statistics and then I’ll go into what
it means for clinical trials. So again, it’s UCSF, UCLA, Irvine, Davis, San Diego, and Riverside. The number we like to start
with is 15-million patients, because we have, the University
of California has treated 15-million patients
over the last 15 years, and we like this number because it’s five-percent of the U.S. population. Now if you’ll look at the modern era, so that’s data with, that’s
kind of claims data-like, so diagnostic data, MRN,
the medication data… If you look at the modern era, and I hesitate to put modern
and Epic in the same sentence, but when we rolled out Epic in 2012, so seven years ago to the present day, we’ve treated between five
and six-million patients. You can see all the numbers
there, 125-million encounters. My favorite is 626-million prescriptions or medication orders, and one-point-three-billion
lab tests and vital signs. Now more importantly
that’s the legal record, so this is the data from Epic. So I know outcomes are hard, deciding on the outcomes are hard, but this is the legal
record of these patients. And everything the care providers need to deliver care now is in one place. We’ve mashed that up with OSHPD data which is California state regulatory data. We have the text elements,
especially pathology, and radiology that we’re pricing now. I think the first speaker
mentioned the death index. Conveniently, the University
of California runs the death index for the
state of California. So everyone who dies in California, we all could keep tabs on, and we also mash that up
with this database as well. So even if they die
outside of our facility, we know that they died in
the state of California. Now we also have our
own self-funded plans. And this is where it
gets very interesting. We have over 100,000 covered lives, just in our own employees
and their dependents. And so we have the claims data there, too, which means we’re also the payer, right, we’re a self-funded payer. So that’s for the rarest (mumbles) we have the EHR data and the
claims data at the same time. And that creates an incredible
financial incentive for us to figure out what exactly
are all of these medications doing and are we getting the
value word out of it, right, as you just heard from
the previous speaker. For continually harmonizing elements, we’ve got Department of Biotech Industry introduce a new drug every week, and we built a whole bunch of dashboards. All right, so if you go to the next slide, you can kind of get an
idea of what that means. This is a very simple bar graph, I just picked one arbitrary blood test. There you’ll see two-point-two-million
LDL measurements. This is just proof that
it’s not claims data. Because obviously, you can’t
get to the blood test results in general to claims data. You can see, lucky for us, we have a million blood tests under 200, and then sadly, we have
a million over 200. And we should make sure these folks are on the right therapy, but of course, now we can, because we have all the
care data here, as well. Let’s go to the next slide. You’ll see where it
starts to get provocative. So here, this slide is blurry. Now this is a cross between a
Mondrian painting and a Monet, (everyone chuckles)
so you see the colors and it’s all blurry. And it’s deliberately blurry, I’m not gonna show you
guys the drugs, of course, that’s kind of sensitive. But to give you an idea, each little box here is a separate drug, and each color is a different campus, one of the main UC campuses. Just to give you a rough idea, the bottom-left box in blue is UC-Irvine, and they used Statacizaman, that’s their number two drug, and the charge data there is $42,000,000. So this is just 2018. So what you see in this boat, so the entire blurry rectangle
represents $1.5-billion. So I’ma show you
$1.5-billion in one slide. This is our top 10 drugs that we charged for in the year of 2018. Now as you can guess, many of these are biologics. Amazingly they’re not the same biologics! Because UCLA uses one
and UCSF uses the other, and you can guess, of course, we’re gonna do the
comparative effectiveness now to figure out what is the
right UC way to treat anything. Now this makes a difference for trial, so if you see on the next slide, and I apologize for the typos here, I was doing this all at
the last minute last night. So here you can get an idea
what real-world data looks like with actual pivotal studies. So I just took a couple of random drugs, based on a workshop we had recently, here. So if you look at AbbVie
Humira, which is well known, what I can see on the
website was that there were four pivotal studies, two for RA, two for inflammatory bowel disease, totalling about one to 2,000 patients. And then you can see in
University of California, we treated just shy of
11,000 patients with Humira, with 59,000 prescriptions or orders. If you’d like Celgene
Revlimid to randomized, multi-central, open-label
trial had 1,600 patients, again, I’m getting this
from their website. So far at the University of California, we treated 5,000 patients with this drug. Regeron PRALUENT, if you
look at the ODYSSEY LONG TERM studies, one of several pivotal studies, they had 2,300 patients, so far we’ve treated 1,300 patients, but it’s a newer drug. But of course our numbers keep growing every single day here. So this gives you a rough idea, remember everything we
do here is multi-center, because we have the five major academic medical centers working together here. This just adds up like crazy. And now we’re incentivized to figure out, do we get the bang for the
buck for these patients with ECOs and our self-funded plans here. Let me run through one more example here, and it’s on diabetes, I think one of the other
speakers mentioned diabetes. So if I could go to the next slide. You see this pastel-y kinda diagram? This is the American Diabetes Association, the guidelines from 2016 on how we’re supposed to treat type 2 diabetes. I think this is well known and
loved by primary care docs, and diabetes specialists. And you can see why you know, this, it’s great we have a guideline
like this, it’s wonderful. Many diseases we don’t. But you can also see kind of
the confusion in the middle. So what this says is, after you’ve tried to get
your patient to lose weight, and exercise, you start
Metformin at the top, and then when that fails, you try one of the six
categories in the middle, and then you add in the
other five categories, and then if everything
fails you go to Metformin and insulin at the bottom there. But there’s not a lot of guidance there in the middle, right? So which one do you use? Do you use Sulfonylurea, do
you use DPP-4 inhibitors? And obviously, it’s a
$200 price difference. 200-decks price difference
if you choose one box versus the other box. So what we start to do is, just what is our real-world practice pattern in the University of California. So the way I think of
guidelines like this, you might have ever seen… Have you ever seen a pachinko machine? You know in Japan,
these pachinko machines, you drop a ball at the top, and then you’re kind of making a bet, like, which way does the
ball fall in the machine. That is this guideline, right? So the patient starts at the top, (everyone laughs) you’re trying to figure out which box do they go to next here, right. It’s literally a pachinko machine, okay, so if you go to the next slide. This is what’s kind of deep, like figuring out the pachinko machine. So here we’re starting with, you see the circle and you
see UCSF in the middle, so I’m guessing around that slide. (audio feedback)
So we used to call these Diabetes Doughnuts, and then we realized doughnuts are inappropriate for
diabetes (audience laughs) so we call these Lifesavers now. So is literally how we started 26,822 patients at UCSF with type 2 diabetes. And a third of ’em start of Metformin, in the yellow, which is great. A third start on insulin,
which is kind of interesting. And then you can see all the other ways we’ve started patients on, including the pink is DDP-4 inhibitors, very expensive ways to start patients. Now we start a patient on this and what do we do next, right, we send the patient home, we ask them, we might get a hemoglobin A1C. We send them home. We say, come back to clinic in 90 days, and if we’re lucky, they come back. And then we make another decision. So hit the next slide. And you can see this is the next thing we did to each patient. So the rings we’re gonna
keep building like this. So the yellow by itself, you see a half those the
first set of yellows, they don’t have another ring, that means, that yellow means, so those patients were on Metformin, we’d never changed the dose again. Yellow changing to yellow means we had, we still had Metformin but
we had to change the dose. And any other color change
means we added a drug, subtracted a drug, changed a drug. Okay we send them home,
they come back to clinic. Go to the next slide. This is the next move. We send them home, they
come back to clinic, and hit the next slide, and that’s the fourth move. So now in this weird
kinda chess game of sorts, our pachinko machine, these are the first four moves we’re making on the patient and the disease. And if you count them up actually, we have 1,600 different ways
to play the first four moves of this weird game of type 2 diabetes. Probably too many. Probably unnecessary
care practice variation. And remember, we got the
A1C up and down the graph. Now with one button,
you hit the next slide. And now we can scale in entirety the University of California, and you can see 159,000
patients with type 2 diabetes there with 728,000 medication orders. And you can see how many ways we have across the entire University of California to actually play this game
against type 2 diabetes, probably too many. And now we are incentivized
to figure out which, what is the efficacy, not just side-effects, what is the actual efficacy of all these drugs that we’re paying for? You hit the next slide, you
can see we’re actually getting pretty good at using
machine learning to do this. This is a very simple graphic of, can we predict where patients
of A1C’s gonna be in 90 days. And where are they are, what we actually observe, so the observers is predicted. And you could see we’re pretty good now. We can machine-learn what the
A1C’s gonna be in the future. If you hit the next slide again, you can see with even
a simple decision-tree we can start to make our
own kind of guidelines. When should we use Metformin? When should we skip ahead
a couple steps? (sneezes) It turns out that if your
patient has a it very high, you look over there and
see our fasting glucose. You know you should try
to skip the Metformin. In our hands it doesn’t seem to work. You’re gonna make a move anyway, so just make that next move faster. And on the final slide to
be the most provocative, I’ll just say that we now
have up to seven years of follow-up data on many of our patients. UCF and UCLA put in Epic, seven years ago, and now we are starting to
look at May’s kidney health, eye health, and we will do
the multi-center longterm comparative effectiveness
studies, including cost data, on type 2 diabetes, hypertension,
lipids, and everything else. And what’s amazing is this
entire database is paid out of operational dollars, so we don’t need a single grant to get
this database launched, because the operational
side of the business now sees that much value from
having this database. So let me just stop there, thank you. – Okay, great.
(audience applauding) Thanks, Atul. We’ll turn things to David Madigan. – Thank you very much,
my remarks are gonna be, a little more, kind of wonk-ish. So I basically, the basic
point I wanna make is, you know, in classic and RCTs
we proceed as if we measure outcomes without IRV,
for better, for worse, we could talk about that. That doesn’t fly in real-world data, so it’s inevitable that
when we measure outcomes, there’s going to be measurement error. My basic point is, we need to measure that
and account for it. This is the wonk-ish part, just, I’m gonna throw around
terms like sensitivity, and positive predictive value, just so we’re all on the same page. Everyone’s probably seen this before. For binary outcomes, it’s obviously a different
story for continuous outcomes, but for binary outcomes, there’s the predicted outcome, what you measure in your database, and then there’s the actual
outcome, the ground truth. And there are many metrics you
can look at in this context. Two in particular, one is sensitivity, which is, of those people
who have the outcome, how many do you detect? And another one is
positive predictive value, of those where you say
that have the outcome, what fraction of those actually do? So you know, any one metric
doesn’t capture the whole story, you need at least two in order
to correct for measure error in any analysis you do. So that’s my basic point. Two things I’d just like to touch on. One is, there’s a project
which many of you are familiar with called OMOP, which was
a public-private partnership between the FDA and
Pharma, led by the FDA, that concluded five or six years ago. What OMOP did was a big kind of bake-off, an empirical evaluation of
methods for casual inference in a bunch of databases
against a ground truth. A bunch of positive controls
and negative controls. So that you hear that the colors indicate, the blues are negative
controls, the reds are positive controls there are outcomes
on the vertical axis, and drugs on the horizontal axis. So one of the things we
did extensively in OMOP and it’s published in several papers, is look at a variety of
definitions of outcomes, which went from things
that were very sensitive, to things that we very specific. So for example, for acute liver injury, one of our outcome definitions was, the occurrence of at least
one broad diagnostic code. Right, so very sensitive,
not very specific. All the way down to, if
you look at number four, theirs involves lab values
and diagnostic tests, and so on and so forth. The take-home message from OMOPs… This slide is incomprehensible. The take home message from
OMOP was, there isn’t a winner. So it was not the case
that for causal inference, you’re better using the most sensitive, or the most specific
outcome, it all depends. It depended on the outcome, it depended on the method, it depended on the particular context. So there’s a complex interplay
between the operating characteristics of your
outcome and causal inference, message number one. The second thing I
wanted to highlight was, most people in the room are
probably aware of OHDSI. So OHDSI is a collaboration of
researchers around the world. Several hundred researchers
around the world, studying observational data,
studying real-world data. And the union of the data that’s mapped, the OMOP Common Data Model in the OHDSI network has about 700,000,000 patients. There are several
activities going on in OHDSI that I think are relevant
to this discussion. And there is an effort
to build a phenotype library within OHDSI, it’s not unrelated to the COMET project that we heard about. There’s a Shiny App Viewer that you can view the phenotype library. It’s very nascent right now, there’s a lot of work there to come. There’s very interesting work going on creating phenotypes, probabilistically. So the basic idea is that you start with a set of noisy outcome data, you know, records where you believe there’s like a, it’s positive records where
you believe it’s negative. Then use machine learning techniques, the standard library of
machine learning and AI tools to build a model to discriminate
between the positives and the negatives and then use that to create a phenotype definition, and you can read about that
(stammers) at that webpage. There’s also a very interesting project on called the PheValuator
led by Joel Swerdel, which is to do with evaluating phenotypes, and the basic idea here
is that you take a set of patients where you’re very confident that they really do have the outcome. You take a set of patients
where you’re very confident that the patients do not have the outcome, typically small numbers of patients. And then use that to
evaluate whatever phenotype algorithm was developed, so externally, or using the project on the previous page. So I’m just drawing your
attention to the you know, there’s a lot of activity
within the OHDSI community, directly related to outcomes and measurement of
characteristics of outcomes. So in conclusion, estimating sensitivity, specificity, positive value is a good thing. It is more than a good thing,
it is a necessary thing, because it allows you to
account for the resulted uncertainty you know, in your analysis. It’s estimating PPV alone is insufficient. So the next you see a paper, or somebody is, their measurement
of MI or whatever it is their way of assessing
and they give you the PPV, you should how, right. That’s not enough, it’s
not enough to them, to back into, to adjust your analysis, your inferences you know,
for the uncertainty. One thing as well is, this
whole game of causal inference in randomized studies,
or observational studies, is not about thumbs up or thumbs down. We’re tired to have this mindset, and you know, from a regulator, from the point of view of the regulator, it’s perhaps necessary, but from the point of view of care, this is all about
characterizing uncertainty, and quantifying uncertainty you know, about the effects of interventions. So ignoring measurement
error, simply is bad practice, we’re diluting ourselves. So we need to account
for measurement error in the inferences and predictions we make. Technically it’s fairly
straight-forward, you know. The challenge is
characterizing the uncertainty, is not such a simple thing, but it’s, you know, I strongly
believe we’re diluting ourselves if we proceed
as if the outcomes we’re measuring in real-world data
are known with certainty. I’ll stop there, thank you.
– Great, thank you. (audience applauding) Thanks, David. Next will be Bill Crown. – Hi everyone! I am primarily an
observational research guy, but I have some friends that do trials, so I’m gonna mainly talk
about the data-related issues of these kind of hybrid studies. One of the things, I think there are, it isn’t all problematic. There’s some tremendous
benefits to the randomization, obviously and we, in
particular, one of them is that, in purely observational data, you’ve got both the
actions of the provider and the actions of the
patient that are confounding. And one of the great things
that the randomization does is solve the balancing issue with respect to the provider, us, and takes that out. But we still need to be
worried about subsequent issues of non-random sampling, attrition, and the different treatment
arms, medication adherence, you know is typically measured
in with medication possession ratios that are measured over the same timeframe as the outcomes, and that creates a
confounding problem that needs to be carefully
thought about and dealt with. And, but we do have this
opportunity to kind of bring together, prospectively
collect the data with data that would be
very difficult to collect. We know that even getting
information from patients and their history of
hospitalizations is very, very highly measurement-prone,
so they’re error-prone. So the ability to be able to capture that kind of health care utilization and encounters with the health care system and the detail there is the kind of stuff that administrative
systems do really well. And when you can combine that
with primary data collection on the outcomes of interest and
really important confounders of interest and it really
strengthens the research design. So I’m gonna mainly just talk
about a couple of data issues. We’ve got a bunch of guidance
documents that are out and doing real-world
studies and I think these, even though they’re
not necessarily focused on randomized trials, they’re focused on these data issues that I’m talking about. So these are great
resources for thinking about the measurement issues
and a lot of the analytic issues that come downstream
from initial randomization, which are the methods issues we’ll be talking about later this afternoon. So three big issues are linkage, and electronic medical record systems, and this is basically the problem
that you don’t really know what you’ve got in a medical record system until you can link it
with external information, claims data in particular, that captures broader measures of
health care utilization. You may just be getting
a feed from a hospital, you may be getting a feed
from an oncology clinic. You don’t really know until
you can do that linkage. A second major challenge
with conducting research with electronic heath record data is that most of the contents
and unstructured notes. So being about to pull
that out effectively is a real challenge with
natural language processing. So I’ll have a couple
words to say about that. And then the third is mortality data. Unfortunately we’re not all like the University of California
system and there’s, it’s a real challenge with
mortality information. So this just gives you
a sense of the linkage. These, I don’t know if, column to the left are
specific source IDs. These are provider groups
that are providing data. And this is just measuring
for an AMI population. The number of AMIs that we saw, hospitalizations that we
saw by provider group, and then whether there was any evidence of medications being administered. And then this issue of the aspirin use, which tends to be captured
often in the amblatory settings and with primary care. And so you see, there’s
this one really weird provider that had just
one hospitalization, as probably a data error of some sort. There’s another two that have really low medication administration. So these are probably lacking
the outpatient experience. So the idea of integrated
delivery networks and understanding what you have for data in terms of being able
to measure safety events that are happening in settings
outside of the setting that’s being captured by
electronic heath record data. And also the ability to be
able to control for issues like non-random selection,
attrition, and so forth, is really, really important. Understanding your data’s important. (papers rustling) There is another issue in
electronic heath record data, and administrative data
in general in terms of not capturing
patient-reported outcomes. So this is one of the great
benefits of setting up research networks like PCORnet
and the NIH Collaboratory. Interestingly a lot of
patient-reported outcomes instruments show up in medical record
data with regularity, and these are some of them that do. But they show up when they’re very sort of haphazard fashion, so it’s very difficult to create you know,
balanced control groups by finding these measure
that just happen to show up in the electronic
heath record data. The other issue is the data
quality associated with them. So this is Mini-Mental
Health Status Evaluations. And you see that we’ve got, out of 46,000,000 patients, initially we saw 179, about 180,000 had a Mini-Mental
State Evaluation at any point in time within this particular, more-recent time frame, there’s a 126,000. This is point-39 of one-percent. So it’s a big population to begin with. It’s not relevant for a lot of people. But still, it’s a very small percentage. Not likely to be representative
or generalizable in any way. But there is interesting content there. Atul is involved in a
study that had to do with using machine learning
methods, deep learning methods, in particular, on unstructured
data and medical record data. So one of the challenges that we face is, we have to map all of this data and get it into structured form, and clean it up and so forth, in order to be analyzed. What if we just did deep
learning on the raw data, on the raw notes? And so this was a collaboration with University of California system, and University of Chicago, and Google, where they applied deep
learning just to the raw data. And the re-hospitalization
models that they estimated, mortality models, actually outperformed any that had been previously estimated. But it took hundreds of thousands of hours of computational time to be able to do it. So you need a lot of infrastructure, and Atul could answer questions on this, much better than I could. Mortality data, we have
real challenges with this, with our number one source
has been social security death master files due to
statutory changes in 2011, around state reporting of mortality data, this has dropped by about 40%
in terms of reported mortality in electronic medical record
data, that’s a real problem. So there’s been some
analysis of these data and attempts to create work arounds
with composite endpoints. So for example, in health
plan data enrollment, often notes whether
the person died or not, their hospitalization
discharge often codes, generally codes mortality
status at discharge, (stammers) but there’s these people where you just don’t know
whether they died or not. And so that’s a challenge
that we need to be thinking about as important as mortality is. So just to summarize the
missing data due to linkage is something that we really
need to understand and we can primarily do
that through the linkage of claims data and
electronic heath record data. Particularly in terms of
just matching the encounters and seeing you know, the percentage of them that we match up. Doing that, you can
then identify the sites that have the most complete data and you can conduct the
studies in those sites. Which gives you
opportunities to be able to do analyses that cut
across payers, for example, and different positive
populations that are like Medicaid versus commercially
insured versus Medicare. And there’s a very
interesting issue about the distinction between causal
inference and prediction. And some of these issues don’t matter so much for predictions. So there’s evidence that
the machine learning methods actually work very well
on incomplete data. You have a lot of that, so you get a lot of the
kind of Medicaid data kind of matched with
electronic heath record data matched with commercial
claims data and it’s pretty, shows promise, in terms
of signal detection. Also I think Bob mentioned the issue about linkage to registries. So this is one way to overcome
some of the limitations of electronic heath
record data is to actually link to highly-structured clinical data that’s captured in registries. Of course primary data collection, the ability to do natural
language processing and deep learning to pull the data out of unstructured notes. And then finally, mortality
data has some of the same solutions that the limitations
of unstructured data do. The linkage to registries is one way the get the mortality data, collect it directly with
primary data collection. We can all lobby to get
the legislation changed to allow statutory use of social
security death master data. And we don’t really know, we haven’t done enough
research, I don’t think, to know whether or not the
death master reduction, because it was sort of an
across the board issue, in terms of the states that
were doing the reporting, influence is relative risk. So it may be that the
reduction may still allow, even though there’s a dramatic drop off in reported mortality may still be able to do relative risk
evaluation with those data. (papers rustling)
– Thank you, Bill. (audience applauding) And then we’ll now turn
things at Cathy Critchlow. – So, thank you. I’m Cathy Critchlow from Amgen. – Let’s click past–
– Click past this, yes? – You know, yeah, just to
the next one, the green. – This one?
– Just the big one. – [Cathy] Big one. – [Greg] There we are. – So I took Greg at his word
that slides were optional. So I do not have slides.
(audience chuckles) But I do appreciate the
opportunity to provide a sponsor perspective, but
they’re really my perspectives on key issues involved in
leveraging real-world data for outcome measurement. So I’ll talk about three main points. One is you know, this effort
on using real-world data, but a companion trend or effort
that’s quite prominent now is just (stammers) increasing focus on patient-focused drug development. And here what’s really key is that there’s the potential to develop, identify, capture endpoints
that are relevant to patients. And we’ve talk about you know, using real-world data to recruit patients, to capture data from patients
at point of care in trials. But also there’s increasing efforts on using mobile health apps, on using other kinds of instruments to capture patient data using wearables and that type of thing. So all of this together
points to a need to really define endpoints that
are important to patients but also one of the issues is, are these endpoints relevant to regulators and other stakeholders. So one of the, you know, the
other point that comes along and that is again some of
the developing guidance on these endpoints is
how do we make sure that these endpoints are relevant to patients but equally relevant to
regulators and payers so that medicines become
accessible to patients. One interesting thing
from a sponsor perspective is how do we overcome the
functional silos attached with doing these studies to the
rigor that needs to be done. So for us it’s, you know, we think of, how do we get the input and I like Lesley’s thing
that evidence generation in this sense was a team
sport, it really is, and I think the people
in the panel all reflect the various people that
contribute to this. But it’s, you know, we have
our operations colleague, our development colleagues, our biostat colleagues, our opiniology colleagues, and so it really is, you know, we’re spending a lot
of time trying to figure out how do we really bring these
people together to develop the capabilities to do these studies well. And it’s hard. And I don’t think, I don’t think that’s specific to us, I think that’s also, all of
the other stakeholders here, trying to bring people together
with a relevant expertise to do these well is something that I think we all need to be working on. So what are some of the
issues that we think about when developing,
using our real-world data, and outcome ascertainment? So one thing that we’ve
talked about is we think about potential use
cases to bring forward, is to what extent do
outcomes, real-world outcomes need to mirror, or
match, or be the same as, those that have been used or
accepted in clinical trials, so one issue that we’ve
certainly found is that in clinical trials you
have routine assessment and collection of data. So you’re capturing data
in clinical trials that you might not be capturing
in real-world data. And one that comes to mind is you know, one of our drugs is for
skeletal-related events, for bone metastasis in oncology patients, and there in the clinical
trial with routine assessment, we’re picking up asymptomatic fractures, we’re picking up procedures like, that are not coded in ICB9-codes like, surgery-to-bone, or radiation-to-bone. So when a real-world data endpoint would have to be pathologic
fracture, for example, something that takes a patient to a health care environment to be assessed. So one has to think about, well what endpoints are reliably
measured in real-world data and to the extent that again, if there’s an endpoint
that’s important to patients and whatever it’s, if the
real-world data endpoint is something that we
should be focusing on, we really do have to pay attention to, as we think, going into the future of, how do we evolve our
infrastructure to be able to capture those endpoints in
a way that’s meaningful? So that just introduces the question of, are there different
outcomes that you know, that there may be endpoints that regulatory guidance doesn’t exist. So clearly, that’s going to
be part of the framework. But we have to think about you know, validation of these endpoints. So one area that you know, I’m, we know what’s going on. That there’s a lot of
efforts going on with respect to replicating clinical
trials and real-world data. So there’s you know, efforts going on, where we’re looking at
real-world data after the trial’s been completed
to see what you know, what we can learn from that. And certainly we can
learn from the successes, and everyone would point to that and say, oh great, real-world you
know, observational data, real-world data can be used reliably. But we also need to
make sure we’re learning from the failures to replicate trials. And it’s not that again, that the observational research, or real-world data is faulty in some way but we have to think
about well what you know, why didn’t they replicate. So you know, another area that
we’re thinking about is well, we’re doing a clinical trial. We could also be looking
at in real-world data, you know, patients that
meet the entry criteria for a clinical trial and
then looking at those endpoints in the real-world
data cohort, and again, and do a real-time parallel comparison between what’s going on
in the clinical trial versus the real-world setting, as well as, potentially linking clinical
trial patients records to their electronic heath records, and again doing real-time
evaluation of endpoints. So, we know that part of our mission here is to inform development of the framework. And again, that’s very difficult
because there’s a catch-22. I think sponsors want some type of clarity in terms of what’s going to be accepted. On the other hand we don’t
want it to be too prescriptive, because again, we wanna be able to harness the power of real-world
evidence without going down that slippery slope of
lowering the evidence bar. So again, as we think
about how to do this, it’s, we have to think about, well what are the questions
that we’re really addressing, and that was brought up earlier as well. So to the extent that we’re
addressing the question, we may be addressing the question of, well, how well do drugs work
in clinical practice settings, which is a very different question than, what is the best estimate
of the treatment effect of a particular drug. And we have to be comfortable with knowing what questions we’re actually addressing. And be able to work with that and know that we may not
be addressing the question that a clinical trial would address but we’re addressing a different question that’s equally important. (papers rustling) I think with that I’ll conclude. Thank you.
(audience applauding) – Thanks Cathy and thanks
all of our presenters. A lot of great topics were covered during these presentations. I wanna ask a question or two, and then we’re gonna open
it up to the audience, again at your mics, so be thinking, okay… We got people going already.
(everyone chuckles) I’ll get to my question really fast. So I really like, Sean you opened up with this, you know this sort of a process
that CMTP and Rubix sort of took in terms of trying to get like, what are those most meaningful outcomes, and in modify, Delphi, all
of those kinds of things. And you arrive at you know, I think that like, wouldn’t
the best case scenario be the outcomes that you came to
in that process happen to be the most meaningful to patients, were really good outcomes
from the regulatory context, were really good outcomes, or like, these are the things that
payers are looking forward to, and there are outcomes that
can be reliably measured in real-world data sources. Like that’s, whoa, that would
be great if we had that, (audience chuckles)
but we don’t, so… You know, in terms of that process, like, what do you think, or
anybody else on the panel, like, when we’re trying to figure out, of those outcomes that
are really meaningful, according to the stakeholders, how do we know which ones, oh, you really should go to real-world data sources to measure those, or well you can kind of
measure ’em in real-world data, but it’s probably better to
sort of prospectively measure them for the traditional clinical trial, or like somewhere in the middle, where it might, maybe you do
some sort of hybrid approach where you know, you sort of
collect some of the information from the trial but then
you’re also supplementing that with the real-world data sources. Like, what’s the, what do you
all think is like the best next steps for a process like that, of trying to figure out which ones are– – I just gotta, kind of a
quick first pass, is like, I just think, you know, it’s
useful as a starting point to have a pretty clear awareness
of which of the endpoints, which of the outcomes matter most. Before you then go to
sort of trying to deal with the tension of how hard it would actually be to get that information. But I would say, for example, from the work we did on
gene therapy in hemophilia, where if one of the six outcomes
has something to do with you know, mental health
functioning, right, which is, and universally, you know, endorsed by the patients particularly, but eventually all the stakeholders. You know, you’d be pretty
hard-pressed to say, it would be you know, useful
to not think hard about how you’re gonna get some, you know, some reflection of that through you know, whether it’s real-world data, we know prescription information, or something, or something, but just, what I’m just
really arguing for is like, that’s the starting point
before you sort of start to whittle down into say, you know, what’s actually possible. And if you know it’s actually
important I think then, you know, and maybe the, well
it’s not gonna be amenable to a real-world study,
unless we actually you know, get that from, you know,
get that endpoint somewhere. – [Greg] Any other panelists
with comments on that? No.
– It’s Atul. If I could chime in?
– Atul, yes please. – [Atul] Yeah so I think
there’s gonna be a certain set of studies that are gonna be very amenable to real-world data. I have this running list of 20 of these, but I’ll just mention a few now. Natural history, synthetic
or simulated control groups, studies to get biosimilars up and running, you know, to inform the
kind of, current development of biosimilars, expanding
indications across countries, across age groups, like with Pfizer, IBRANCE, the male breast cancer drug. How to trim the trials. Do we need all that CRF data anyway? You know, trials for the future. General efficacy versus
approval drug trials. The efficacy of specific populations. So there are gonna be a lot
of these kinds of studies that are very unique
to real-world evidence. The hybrid ones are gonna be
a harder one to figure out, but I think, there’s a
whole class of studies that I don’t think we’ve really
been able to do well, but with so much digitization
of clinical data, might now be enabled. – [Greg] Okay. Okay, Jesse? – [Jesse] Yeah, thanks. Jesse Berlin, Johnson & Johnson. I wanna just follow-up on something. I’m being disciplined here, ’cause I had a whole list of questions. (audience laughs)
But I’ll focus on one for David and it’s kind of a comment that I want you to respond to. You mentioned this idea that measuring the characteristics of an outcome measure and then not doing anything
about it is not enough. And I just wanna get back to
something Lesley mentioned this morning, which is the validation
substudy they’re doing. That the following to that is, even in principle, you
need to take the results of that validation study and
then apply them to the results of the big study to adjust for whatever misclassification merit, and that has a cost. Since you’ve estimated
the error with error, you have to carry that error through, and you end up inflating the variants. And just to be blunt, one of the reasons I mention it is, it’s a little bit different context, but I’ve seen, there’s a
paper by Martin and Jimmy, and Mike that got criticized
for explicitly taking error into account, inflating variants. And the criticism was, well, you guys are just
some bad industry people and you’re inflating the variants to make a signal go away. So then, there are some really important implications to all this that
people need to understand. – Yeah, I mean I think it’s, it’s not a technical problem. It is more of a cultural problem. So it’s you know, we’re
not accustomed to routinely measuring the you know,
estimating the measurements that are associated with our outcomes. We do that to some extent, but generally what we do, is we measure it using you know,
PPV or something like that, and say, gee look it’s not bad. And then we proceed, as if it’s perfect, and that’s a cultural problem. So there’s no particular
technical problem, you know, with regards
to using error rates that were measured on your own data, you know, and then using that
to account for uncertainty in the inference, and then
there are all sorts of methods that are available for doing that. But there is a cultural problem that… You know, I think, with
Martin Landray earlier, talked about essentially,
he gave an example earlier, he said, essentially, let’s
supposed the sensitivity was around about 80% and then he showed, gee look at, doesn’t make
that much difference. Well that’s kind of
not good enough, right. What if it’s two-percent? And you’ve got to measure it
in order to make statements about how it affects the, you know, how it impacts the inferences. – Yeah that’s a good point
about the cultural, too, and you are referring to a safety outcome, but if you did the same
thing for an effectiveness, it’d be the opposite, you know, criticism, but…
(everyone laughs) Make sure to keeping that on balance. Okay, next.
– Jay Irno, Pfizer. My question is about real-world
data and rare disease. I saw it was a bit, a trial and every other trial’s 50,000. These are numbers that we dream of. But I work now and
concerned with mucormycosis, which is an infection that is
basically a death sentence, if you get mucormycosis
in a right setting, you don’t worry about
the underlying condition which is a (mumbles). So we’re thinking of how to
generate real-world data, which is potentially the best promise. We have a rare disease to help
the more number of patients, on the other hand, it is so rare, that we cling to every patient
we can have that we are risking getting wrong information, just because we can’t get hold of them. Just to give you a reference, we ran a five year study, around the world, in all continents, 37 (mumbles) patients from
37 centers were enrolled and we have 37 patients
over a five-year period, so that’s one patient per
center, every five years. So just thinking about how to fly real-world data in such setting. – [Greg] Yeah, good points. Any comments from the group? Okay, Naomi? – [Naomi] Yes, so there
was a lot of discussion, and I think everybody agrees. There are outcomes that
are important to patients and outcomes that are important to payers. And probably it’s not so hard to agree on what all the outcomes are, what’s really hard is measuring them. – Yeah.
– Okay. And we are moving more and more carefully, cautiously, thoughtfully, that what we’re thinking of is preferred outcome measures. To understand what measures
actually capture those outcomes that are of interest, moreover, what’s a clinically meaningful difference. When do you need responder
criteria, as opposed to, because generally we tend to
compare averages, all right. But those may not really
be that informative. So my question is, and this is not a anti-industry question, or an anti-real-world
data or evidence question, it’s a question of genuine interest, because I see lots of opportunity here. And so how do we bring those thoughts into the real-world data sources, in terms of what are the measures and what are clinically
meaningful differences? – Yeah–
– Yeah I think the mortality’s a good example of that because you know, mortality is such an important endpoint. – Yeah.
– Right? And so in a world where you’re just using purely observational data that’s coming in from electronic medical records or claims we know that is,
has really serious problems. And so the thing that
I think is so promising about these kind of hybrid
designs is that you collect it. You know, ideally it would be available for everyone in a reliable way. And part of the learning
health care system, we would have this. But if we need to do it in
a study-specific manner, at least that’s the way to get it, you know, for an individual study. And it’s a great, I think it’s a great example
of what you’re talking about. There are, you know, so we need that for whatever the primary or secondary outcomes
are of a particular study that we’d wanna do. – I just had one more point of this, just sort of triggered me
to reflect that you know, one of the things that I
don’t think works well, and whatever the decisions
about meaningful endpoints are, are for individual
research teams to try to figure out themselves,
consulting with stakeholders. It seems to me like,
that’s what leads you to one of those useless
Consumer Reports Tables, where everybody’s you know,
gone through their own process. So I really think, and I know the FDA has put out some RFPs to do work in terms of harmonizing points in therapeutic areas. So I think, but it’s gonna take
some sort of centralization and agreement rather than, you know, I think we’re lost, (chuckles) separate from the questions
of feasibility if we do this project by project, or
organization by organization. – [Naomi] Well, and I
do wanna emphasize that in seeking these preferred measures, we’re not interested in making them up. We’re interested in
understanding what’s out there, and for there to be some
kind of universal inventory, of not just the measures. But what’s a clinically
meaningful difference and again, when do you need responder criteria. But I will also say, your point Sean, is not only important
to any particular study, it’s important to our
ability to accumulate and aggregate over time in order to learn. Thank you.
– If I can just have one quick thing really briefly.
– Sure, real quick. – I think one of the things is, you know, we as researchers, may as a community agree on the endpoints. We then have to sell it
to the other side to make it part of the common
collected, what’s done, and convince them that this is
not just a research question, but this will be a care question
and will help with the cost and the development of
the patients themselves, and make that sale so then it
would be much more available, through EHR and through
these other common elements. And I think that’s another
bridge we’re gonna have to pass. – [Greg] Yeah, but hopefully
including some of those stakeholders in the process
of developing them, gets by– – Exactly, like the example
given with the UC system. – Okay, we have time for
one more quick question. – [Jerry] Great, my
name is Jerry Christian. I’m from Chicago and I’m
responsible for a population health program in Chicago,
one of the health systems. But I wanted to build on the comments that Sean and Elizabeth
just made actually which is, this concept of which outcome matters, I think Sean showed in one of this slides, it depends on who you ask. There is some variability there. By stakeholder, even within a stakeholder, there may be certain cost
or timeliness issues, that people really are willing
to make some trade offs. I’m reminded of the quote that
sometimes we focus so much on rigor that we lead to rigor mortis, (audience chuckles)
’cause we’re kinda stuck in this endless loop of trying
to get it totally right. So the comment I wanna offer maybe like to hear what the
panel would say is that, just as the outcomes people
care about might vary, their risk tolerance for measurement error might vary as well, if they’re able to make
more timely decisions that are not so costly that they
can’t make any decision. I don’t know who that would
be on the panel but maybe risk tolerance for error
might vary by stakeholder and perhaps we should think
about that in establishing standards that we allow
the stakeholder to decide. – Great, thank you. Okay, okay. Great, so that brings us
to the end of this session. We are going to break for lunch. I’ll be starting back
here at exactly 1:30. So, I apologize you get a
few minutes shy of an hour. It is on your own. There are a bunch of
good restaurants nearby. Also, Sarah at our registration desk has a list of local restaurants. If you’re running late on time, you can bring your stuff
back into this room, just so that you’re here at
1:30, that would be great. Thanks and have a great lunch! In this particular session, we’re gonna focus in on another
very unique implementation challenge that didn’t
come up this morning, but we’re gonna zero-in on it here. And that’s related to blinding. There were many questions about
when blinding is necessary, and within which component
of the trial blinding should occur if at all. In this session we’ll have
a lead-off presentation that will showcase challenges with blinding in the real-world settings. Although this particular trial was not designed to fulfill a regulatory decision, it does provide a really good
example of the challenges related to blinding in
routine care settings. So that presentation will be
followed by panelist comments, and we’ll hear some different
ideas and perspectives on why we should or should not blind
in these kinds of studies. So now, introducing our
speakers, as well as panelists, Simon Skibsted is Director
of Clinical Development and Outcomes Research at Novo Nordisk. Rita Redberg, hi Rita!
– Hey! – Professor of Medicine, University of California at San Francisco. Satrajit Roychoudhury is
Senior Director and member of Statistical Research and
Innovation group at Pfizer. Nancy Dreyer is the
Chief Scientific Officer, and Senior Vice President at IQVIA. And Peter Stein is the Director of the Office of New Drugs at CDER FDA. So with that, I’ll open it to our opening
presentation from Simon. – Thank you and thank
you for the invitation. It is truly exciting to be here. I think this workshop,
at least based on what I’ve heard this morning has
been extremely exciting. I’ve learned a lot, so I’m
very honored to be here. My name is Simon Skibsted. I’m a Director for Clinical Development and Outcomes Research at Novo Nordisk. And for the last couple of years, I’ve focused on employing
new trial designs into our organization,
including pragmatic trials, or trials in a health care setting with a randomization component. And I would like to echo what has already been said that is truly a team effort. It requires a lot of different
stakeholders from various skill areas, data
management, biostatistics, clinical operations, and so forth. Luckily that has also
resulting in a couple of initiated trials, pragmatic trials in various settings, and populations. One of those I will
showcase here for you today, the SEPRA Trial. Hopefully that will make the
discussion today a little bit more concrete and I’m sure it
will be a lively discussion. So before I got into the
details of the SEPRA Trial, I will just briefly
touch-upon our considerations for why we wanted to go
into this trial concept, and why we believe it’s important. Then I will go through the
specific trial design aspects of the SEPRA Trial and after that, zooming a little bit in on the blinding aspects of this trial, and lastly, sort of trying
to further elaborate on why we made the decisions that we did. Now one of the reasons for doing a trial with a randomization component in a real-world health care setting, was generally because
we wanted to understand how our drug worked
once it is on the market and once it’s being used in the setting and in the population in which
it’s intended to be used. So truly to understand the
effectiveness of a drug. In this specific example,
it’s a once weekly GLP-1 receptor agonist called semaglutide s.c., for the treatment of type 2 diabetes. And what we wanted to do was, we wanted to assess the
effectiveness of such a drug in an externally valid
setting of a specific U.S., health care system making it
as real-world as possible. But at the same time,
maintaining a high internal validity that we know from
a randomized clinical trial. Furthermore, in this setting
we are able to assess more novel endpoints that we
don’t traditionally look into in our explanatory
RCT’s pharmacy patterns, health care resocialization, and so forth. So in many ways, these kind
of trials make sense to us from an organizational standpoint. Now the SEPRA Trial, before we started the trial, we sat down and looked
each other in the eyes and agreed on what kind of key components should be part of this trial. Obviously we wanted to have
a randomization component, to ensure that there was
a high internal validity. But we also wanted to
increase the external validity as much as possible. And that goes both with
regards to the population. We have a very broad eligibility criteria list that I will show you later. But also in terms of the setting, we’re doing this trial
in the patient’s own health care setting, trying
to mimic the real-world, as much as possible. In terms of comparator, we have standard of cares as a comparator, which means, any other
drug for the treatment of type 2 diabetes, we did not
believe that placebo would be relevant in a real-world
trial, that’s certainly not what the patients go up
against in the real-world. Furthermore, we didn’t believe
that it would make sense to go up against one single drug,
out there in the real-world. When you wanna assess the value of a drug, it’s against what’s already out there. So that was one of the decisions for that, I’m sure we’ll get back
to that discussion later. Furthermore we wanted to have as little intervention as possible. In Europe we call these trials,
low interventional trials. And we were able to do that
because we’re partnering with a payer so we can use slides
that are IT infrastructure, we get readily access
to their claims data, and as you will see later
in the trial design slide, we have very few trial
visits, and few assessments. Normally in our standard,
explanatory RCT’s data collection, and monitoring are much more important than usual care. Here we wanted data collection
of monitoring to reflect usual care as much as possible. And then lastly, as
already been alluded to, this is not a trial for
regulatory purposes, we made that, we agreed to that up front for a variety of reasons. But I think for the
purpose of this workshop, we can surely discuss it in how this looks from a regulatory perspective. Now the rational of the trial, we did this trial to
inform clinical practice on the comparative effectiveness
of semaglutide s.c., versus standard of care
in a real-world setting in adult patients with type 2 diabetes. And what we wanted to do was
to investigate this long term comparative effectiveness
on a variety of parameters, related to glycemic control,
body weight, health care, resource utilization, and
actually the list goes on. This is just a selected list, but for us one of the key
things was to assist glycemic control, body weight, and health
care resource utilization, in a real-world setting. So now I’m gonna show
you the design slide. We have very few inclusion criteria. Patients need to be adult, have type 2 diabetes, and then
they need to be inadequately controlled and up to two
oral anti-diabetic drugs. Please note that there are
no HBA1C criteria cutoff, or BMI cutoff, inadequately controlled is defined by the health
care professional, as it would be in a real-world setting. If the patient decides to participate, they will be randomized to
either semaglutides s.c., when speaking or standard of care. And the standard of
care can mean anything. Also other GLP1 receptor agonists, insulin (mumbles) to twos. Anything except for semaglutide s.c. And I hope you can appreciate the fairly low interventional set up. We have a total of three
visits in a two-year period, a randomization visit, a one-year visit, and a two-year visit. Since we’re doing this trial in the patient’s own health care setting, a lot of these patients
will come in more than that, and we’ll capture that as well, but we only have three
protocol mandated visits. Furthermore, as I’ve alluded
to, we also here in this trial, have the capability to
collect other kinds of data including claims data. Now treatment can be
adjusted according to local clinical practice, switch to, add-on, or discontinuation of anti-diabetic
treatment is permitted, as it would be in a real-world setting. The only thing that we do not
allow in the protocol is to switch from standard of
care to semaglutide s.c., as semaglutide is the drug
that we’re investigating. It is a U.S. only trial, it’s what we would call
a face-forward trial, it’s not for regulatory purposes. It is open-label, I’ll go
back to that in a minute. And we’re going this in
collaboration with our partner. And within this partner, network, it’s obviously multi-center
across the U.S. Now we have a long list of endpoints, I’m just showing you a selected list, the primary endpoint is
proportion of subjects reaching a specific A1C target if you won. Other selected effectiveness endpoints that I’m sharing with you
today is individualized HB1C targets, health care
re-socialization, work productivity, medication persistence, and adherence, as well as number of
hyperglycemic episodes leading to an inpatient hospitalization or emergency room encounter. Now the blinding set
up for the SEPRA Trial, it’s an open-label trial, both the physicians and the patients know what drug they’re on. An important point to
make is that all the drugs that we investigated in this
trial are all approved for the treatment of type 2 diabetes,
they’re all on the market. And in order to mimic the
real-world as much as possible, once a patient is randomized, let’s say, they’re randomized through semaglutide, then they will receive a
prescription for semaglutide from their doctor, and they will go
to their own, usual pharmacy to pick up the drug, as they
would in a real-world setting. Another aspect that we’ve done
is that we do realize that there are differential
out-of-pocket costs, depending on what kind
of a drug you’re on. There are various ways
you can handle that. You can decide to pay
for the drug, completely, as we do in our normal trials. You could argue that’s
not very real-world. The other end of the
spectrum could be to say, you do not do anything, you don’t pay, the issue with that is due
to the ever-so-complex health care system in the U.S. where
drugs go on and off a formula, we could end up studying
the U.S. health care system, rather than the drug. So what we have done is that
we have you to set a max out-of-pocket costs of
about $40, I believe. And anything above that we will cover. So there is an element
of out-of-pocket cost, again to mimic the real-world. Novo Nordisk person was blinded
until DBL, database lock. We have a thorough, extensive
randomization and blinding plan as well as the
statistical analysis plan, with the pre-defined endpoints. All of these were developed
and signed off on before first patient, first visit. So we have sort of that documented, and the trial team as such, are blinded. Now why did we do it like this? We have (mumbles) by now, how to discuss how to go about this. As I see it, there are two key aspects. There are scientific aspects to why we did it the way we did it. And there are operational aspects. In terms of the scientific
aspects, if we were to introduce blinding, obviously we would compromise the level of pragmatism since blinding is not really real-world. We would deviate from
usually clinical practice, and we would sort of move
away from the original intent of the study which is to
assess how this drug works in a real-world setting. Furthermore, the majority of
our endpoints are objective. They are blood tests, glycemic
markers, HBA1C and others. We do have a couple of (mumbles) as well. But for the majority they are, what I would call, objective
endpoints lab tests. And lastly, I guess that’s
probably more of a philosophical notion but if you truly want
to assess the effectiveness of a drug, you really want to
tinkle as little as possible with the set up and the
more you sort of misuse your point of trial, the more you
risk changing the behavior of the patient and we
really didn’t want to change the behavior of the patient,
as much as possible. I realize that we have a
randomization component so that is fairly interventional. But other than that, we really wanted it to be as low interventional as possible. Then there are obviously
the operational aspects, since the comparative standard
of care and you just saw in one of the previous presentations, the many different treatment
options for type 2 diabetes. I’m not sure it’s even
operationally possible to blind when you have 50-plus drugs out there. Some of them are oral,
some of them injectable, so from an operational standpoint, I think that would be a nightmare. And probably impossible. We’re using the United
States package insert as the reference safety information. We don’t as such have trial product labeling as we do in our normal trials. The trial label as such
is what is being used. And as I mentioned, the patient
will use their own pharmacy to mimic the real-world as possible. So with that I wanna thank
you and I just wanna end, by saying that, and I’m sure we’ll spend the next 60 minutes discussing this. Obviously it is important
to realize that we want to have as high internal
validity as possible. But we want to make sure that
we don’t go so much overboard that we risk the external
validity and thereby moving away from the original intent of the trial. So that balancing act, I’m sure we will discuss
for the next 60 minutes. Thank you.
(audience applauding) – [Greg] Okay, thanks Simon. So, we’ll turn to our first reactor, Rita. – Thanks. First I’ll just comment my
informal index of gender disparities in a field is, if I have to wait in
line at the ladies room, but that’s why it was a line, (chuckles) it doesn’t happen at
the cardiology meetings, I’ll just say.
(everyone laughs) Always on time after the break. So I’m gonna talk about key
considerations for blinding and randomized real-world studies and just sort of build on some of the things we’ve been hearing today. I’m a professor at UCSF and
editor JAMA Internal Medicine. So and I’m gonna talk a
lot about medical devices because it’s my own particular interest, and I do think the considerations for blinding are a little different when we think about it
for drugs in devices. Because as Simon was noting, you know, you can blind a drug study and you don’t have to
have a placebo control, because you can have standard of care, if it’s, as long as it’s two pills, people can still be blinded but that doesn’t happen for a device study because people don’t get placebo devices and so you really do
have to think about it and do a randomized
control trial if you want to have a placebo
control in device studies and I wanna spend a few
minutes telling you why I think it’s really important
to have placebo control. And I’m using the term
placebo and not sham, because I think sham has
this negative connotation of something that is
like, we’re fooling people and where placebo is
what we generally accept. And I think of placebo
for devices the same way I think about placebo for drugs. And in fact placebo effect
is even more powerful for devices because for
procedures you know, I think because it’s a
little more invasive, people get a much bigger
placebo effect from a procedure or device than they do from taking a drug although there are powerful
effects in both cases. And so while clearly real-world
evidence has a big role in filling in a lot of
information that we’ve been talking about this morning
I think that first, the safety and effectiveness
has to be established in a well-done, high-quality,
randomized control trial before we can move on to
everything we can learn from adding real-world
evidence and getting subpopulations and much
bigger populations. And so that’s why I think
that as FDA approval, I think most people assume,
should assure safety and effectiveness, although when (mumbles) who’s sitting right
there, in the audience, and I like 10 years ago
or more started looking at the quality of evidence
for medical device approvals for high risk
cardiovascular devices. We were surprised to learn that actually randomized clinical trials
are not the norm or, and are actually done
in only a small minority of high risk devices before
they get on the market. And of course you know, there are a lot of different
considerations, devices, and if you find out they’re not
safe or they’re ineffective, they can’t easily be removed, because generally we’re talking
about implantable devices and that’s risky to remove, also you know there’s a move now, once FDA approves a device to
have immediate CMS coverage and once CMS covers, then
private insurers tend to cover, and then it’s very hard
to do a randomized trial, because there’s frankly, no incentive. What colleagues for example,
who were trying to randomize for a trial of the left
atrial (mumbles) device, which was going to be a randomized trial but was already getting
insurance coverage. They said, doctors said, why should I randomize when I can get paid for every device I put in? So it can be very difficult once coverage becomes established, and culture, and practice becomes
established to then go back and do a randomized trial. Although I’m gonna give one prominent example in a few minutes. And so Rob Califf, sometimes
at Duke and I think, and now at Google, and the Bay area, but when he was FDA
commissioner was asked, whether sham controls should
be required for device approval and he said, (mumbles) well, “do you want to get the truth or not?” And I think that sums it up pretty well. You know, if you wanna know
whether the device works or whether you’re
getting a placebo effect, you have to have a
placebo arm of the trial. So this was the example I
referred to a moment ago, of how after 30 years of
percutaneous coronary intervention which I’m sure everyone
is familiar with in this, you know, it’s stints
that we can put inside a coronary artery where there’s
a narrowing or a blockage, in order to widen the artery. And so it actually have
been around for 30-40 years. If you go back and look at
the original FDA approval, for the first percutaneous intervention, there was no placebo control, there actually wasn’t an active control that was a historical control of a very, in my opinion, mismatched
group and a very small study. And as I’m sure everyone
knows, we do millions, and millions of stints
every year around the world. And last year, or two
years ago, a British group, because I don’t think it
would happen in the U.S., did a place-controlled study,
the first one ever of stints, where they actually took
both groups to the CATH lab, you had to have chest pain and a blockage in order
to enter the study. And then both groups
thought they got a stint but only one group actually got a stint, and lo-and-behold, they were
absolutely no different, it was a negative trial. It was negative on every endpoint. There was no difference in angina, no difference in exercise time, the quality of life
indicators were all the same. And so you know, 40 years after
we’ve started being certain that this idea of opening up
the artery was gonna be good, it turned out in a placebo-control trial, that it actually wasn’t beneficial and so, well some people say, well, they’re concerned about
the ethics of doing placebo procedures, what about the
ethics of doing procedures on millions of people with
all of the consequent, adverse effects when the procedure is not actually any different than
a placebo, essentially. I would say ineffective. And you know, then you have a
whole, or the whole specialty established, people have been trained. I mean, I could just say briefly, the trial was not well-received in the interventional community.
(audience laughs) Nor was my editorial
that went with it called, “Last nail in the coffin for PCI”. (audience chuckles)
So where we called for, I wrote with a colleague, David Brown, and we called for a
change in the guidelines for implantation of stints,
based on the evidence. But that’s why, this was
editorial (mumbles) and I wrote a few years ago in The
New York Times around the, just before the 21st Century
Cures Act was passed, but I think that with
the passage of that act, it makes it even more uncommon to be sure that we’re getting good
pre-market evidence and what, to me, good pre-market evidence means blinded device trials because we’re really shifting the burden to post-marketing in order to get faster approval of innovative devices. And I think it’s great
for innovative devices, but I don’t know how you
know a device is innovative, unless you’ve actually done the study to show that it’s safe,
effective, and beneficial. But unfortunately post-approval so far, I don’t think has been all that we hoped it would be because the
studies generally have a small sample size, they’re
generally not randomized, they’re often a continuation
of the pre-market study, very few of them are actually completed, and they take years to be completed. So we’re not getting the
data that we really need from post market studies that we’re going to be increasingly reliant on. And then finally this was from
The New York Times editors, just a few months ago, where I think, for a number of reasons, the problems with devices, I think the Netflix documentary, The Bleeding Edge, which
maybe some of you have seen, talked about some of the
problems with medical devices and lack of data before
they get on the market. But the editorial just said, “80,000 Deaths. 2 Million Injuries. “It’s Time for a Reckoning
on Medical Devices.” So in my opinion I think the blinding is a really important part of trials, before we go on to the
real-world evidence gathering. Thank you.
(Greg murmuring quietly) (audience applauding) – [Greg] So next, we’ll have Nancy Dreyer, Chief Scientific Officer and
Senior Vice President at IQVIA. – Yay, I’m next, great! Before you get into the slides, interesting example we’ve seen, and certainly interesting comments about the value of placebos for
evaluating whether something works but I wanna shift to
the focus to specifically blinding of treatment and
thinking about real-world evidence in the pragmatic trials because… So I’m not gonna argue
against placebos but most real-world trials compare
the product of interest to whatever else patients could use. So the idea is they’re often
standard of care comparators, and what we wanna talk about here is, you could say, it’s what’s good enough
for government, right. Now that used to be a pejorative statement but I only mean it in the best way. And really the question comes down to, how much measurement
error is there going to be if we know what the treatments are, and we have preconceptions
and would that be big enough to totally wash
out any evidence of truth? So I’m gonna use my time to share some examples that my
colleagues and I worked up. And I’d like to acknowledge
my collaborators, Cindy Germin, in the audience, who was working with me on this. But we started trying to classify, go back to the outcomes discussion that we’re having before Greg, if you don’t mind my covering two topics. But the question it’s all
about fit for purpose, and what’s your purpose,
what’s your outcome, so we start there. So what our group did is start to say, well let’s classify the outcomes, and assume it’s not a question of, do you blind or not, and is not blinding good enough, but what’s the outcome here? So we started out and I have
two slides here where we tried to classify outcomes. And I think my first example row is probably the most arguable. It’s emotional states are
health reported by a patient. Now most of us know that patients can be tremendously influenced by
the perception of benefit. Is it an expensive product? Is it a fancy new device? Is it gonna work? But I think one of the things that you need to take into account here is the durability of effect. So for example, last time I
talked about this in public somebody showed me an article that shows that patient’s perception of treatment could actually influence their HB1C. But the question is, how long? I mean, can you really believe… If you believe that this product is gonna make you feel better
and take away your pain, it might help you through a procedure. We’ve got some good evidence of that. But longterm chronic pain, I find it hard to believe that the power of positive thinking
will carry you through. To me the next analogy
of that is you know, if you believe you can cure your cancer, and we know that doesn’t work. So I think that perceptions
can be overblown, and we need to think about
where the potential for bias is. So here we state that the
patient’s report may be heavily influenced by their treatment. The clinician’s less evident, and that’s where we do have the tool of central adjudication
committee that you could use, or you could have selective
blinding if you needed to. But now let’s move down
the spectrum, okay. Now we have events, or signs, or outcomes reported by the clinician. Now if it’s something
concrete and measurable, it’s less likely to be influenced by bias. And I do think this is the potential for using a central
adjudication committee. So okay, Doc, you’ve made your decisions, now let’s go send that to reviewers who are blinded as to the treatment. That’s not so hard to do. And you can see, based on your notes, without that information of treatment what would they decide? So much less expensive, much… Allows you to be much more
generalizable in terms of your site selection
and Simon I was so glad you mentioned the practical
aspects of blinding and how it reduces your choice of sites and adds operational complexity. Then we were talking about event diaries, which is actually fairly popular. It’s a good tool for finding out, did you go to hospital, could you go to work today, how, a number of events of things that patients can actually count. How many bleeds did you have, between this and that and the other… And people are usually pretty reliable about reporting those events. If you give it something
that they can understand. That a consumer can understand, they can tell you about it, and they don’t know, whether what they’re reporting is better than the other arm. So I think that it’s
not clear that you have such a big effect that
you could get a washout. Now going down the
spectrum, look at this list. So here we have a lot more
objective tests and measurements. And I would offer to you that these are essentially blinded by the readers. So when I take, when I go
give my blood at the lab, they’re not asking me about my treatments, they’re take the blood,
they’re gonna spin it, and give me the results, and it’s essentially blinded
already to treatment. You don’t need to do something different. Now we go back to the
HBA1C argument you know, did my belief actually
change my lab value. And I’ve put forth my feelings that I don’t think you’re
gonna see those as durable for many endpoints which
aren’t so susceptible to just believing you’re
better, and getting better. But the imaging, a lot of these
issues, what you see here, physical tests and measurements, with the (stammers) message here is, if you can get objective assessments, particularly ones that
can be read by a machine, or by a different reader
that doesn’t know you, or your symptoms, you should be able to get a pretty accurate answer, to considering how reliable the test is. So I think that the questions that we’re gonna be dealing with here are you know… How long could this be? What’s the potential impact of the bias? And we epidemiologists and statisticians have approaches for quantifying that. And we can take you
through a whole set of what if examples of how much
bias there would have to be to explain away
an effect that you see. My last of two points (chuckles)
is I just wanted to say, this is, all around the
world people are starting to appreciate the value of
the pragmatic trials. You know, we have, we’re here
because of our regulatory interest in the FDA, but what we see in Europe is
a great interest in big data, and trying to harness that
for a lot of the experiments that we’ve been talking about
and then we saw this latest guidance, draft guidance
coming out of China, where they’re talking about the importance of practical or pragmatic
randomized trials and they’ve already given
the ground in their guidance that they don’t expect to have blinding so I think that the point
that they make about attention should be paid to estimating
and adjusting the result of you know, about the detection bias is where we’re all gonna end up, and this is a movement that
you see going around the world. And my last thought I’d
like to leave you with is after we settle this blinding issue, or understand where it’s really important, and where it isn’t gonna
make that big of difference, then let’s move on to the intent to treat and the as used paradigm, because what we heard from Bob Temple was if they’re not taking the drug
it’s not gonna work, right. The more practical your trials
get, blinding or no blinding, you need to find out if people are actually using the product
how they’re using it because that’s often the answer to why something works or doesn’t work. Thank you for your time.
– Thank you. (audience applauding) Thanks, Nancy. We’ll go to Satrajit. – Thank you. I do not have any slides. I tried to put my–
– Where’s the clicker? – Here.
– Oh. (everyone chuckles) – Excellent. So I think Nancy’s, is
a very nice state set up to look into basically
the analysis aspect. So as a statistician, as my
training is in statistics. When I look into this blinding aspect, I try to look into its impact
of overall study results. That’s one of the key
consideration as a statistician, I looked into it. And when we looked into
such a thing, of course, as a statistician, I mean,
we definitely encourage doing standardization or
having blinding if possible. But of course there are settings
when it is not possible, because it kills the whole pragmaticism, or all the appeal of this
real-world evidence trial. So maybe the questions can we think of, alternatives as well in that setting. Maybe if we think about
randomized real-world study, we think about alternative randomization, like cluster randomization. That’s one of the choices you
may reduce this type of bias, of course this comes up
with another challenge. When we bring into a different
aspect of randomization, you need to have your sample
size calculation appropriately matched to that and that means
you need to have some sort of inter class correlation into the consideration of sample size calculation. So the question is, if
we can’t do the blinding but if we still need something
which is an evidence that we believe on and that we can
reduce something analytically to handle on that. But of course, not all the biases can be removed using
analytical techniques. The other aspect that I try to bring in, because especially in recent days, people start to talk about, more than like an endpoint
they start to put a framework, especially Ice Age in a guideline. Recently they start to put
a framework of estimate, which means you start to think
about what are the quantity you are trying to estimate firstly. And here we are hearing
what we call effectiveness. Can we quantify our quantity
of interest in a way. And then maybe what was
happening due to the blinding were to bring in as a bias, or maybe there are some
of the selection problems that coming into blinding. Maybe talk about the intercurrent event. But of course this, we’re only thinking in randomized settings
so far in estimates, but I guess having that
overall framework may be useful and worth a research on that dimension. Now coming back to this
problem of once again, the blinding aspect, right. One of the key blinding aspect problems that I used to see often is, in some of the oncology
trials that I handle on, is basically the missing, the amount of missing-ness that we see. And the missing-ness of (mumbles) with all the natural processing
algorithms there to use. So it’s very important how this to, as a statistician I feel, when we do the trial with people, how the searching was done. Because often there is
a gap with how this, is this, just the word blinding, if you go into a clinical trial database it’s not always very evident from there. So one extra caution and
may be useful, you know, in order to do that thing. So and then the question comes in, can we somehow, and I’m
sure gonna in next session, is gonna talk about more about it, and how to correct the bias. So basically once we think
about data which we have maybe has a effect of knowing the treatment and the causal effects, it comes into a causal inference problem, and then of course the next session, they’re gonna look into, they’re gonna definitely
reflect on other techniques like propensitive school and marginal models, as well as the other techniques
that has been advancely came in which how would that can work? But when this works to
me in a design phase, when I start a trial, I need to make sure that I do
have an accurate sample size in order to do such analysis. And one way to do that from, given the recent computational techniques, would be maybe looking into
maybe different scenarios and different extremes and
see how might operating your (mumbles) of a trial looks like. How much I stand the validity of the trial in a way, basically. And then definitely looking
into the missing value. And one of the major problem that brings in the missing value here. Because you have a subjective
missing which is running into this problem and the issue is when you have subjective missing, they’re definitely, you cannot
do a complete case analysis, because that gives you a
bias result more often. So it definitely needs much
more statistical techniques, as I mentioned, some of the
causal inference techniques in place in order to
make the inference value but before we make the analysis, I think it has to be started with a consideration at the design phase in order to handle that. So for me, blinding as a statistician, I feel blinding is very important aspect. But if that’s absolutely
not possible and that kills the whole appeal of real-world data, the question is, what
can we do analytically, and how much we can do to
at least get an evidence, which is a value and can be used to see the effectiveness of the drug, thank you. – Thank you.
(audience applauding) Yeah thanks, Satrajit. So we’ll turn to Peter Stein next. – [Peter] Greg if it’s
okay, I’m gonna sit here. – Sure.
– You already said the lights were hot up there, so– (chuckles)
– Yeah. It’ll make you sweat and yeah, you don’t want that.
(everyone chuckles) – I will say that I have
the pleasure of sharing a lot of stages with Nancy, lately. So I always get to agree
and disagree with her. So this… (trails off)
(everyone chuckles) So I’m gonna, I’m not a statistician, as an acknowledgement
that’s my acknowledgement, I’m not a statistician, and I am gonna step back and take it from, perhaps a bit more of
a simplistic viewpoint. As I think about, you know, what we are thinking
about, what is our concern. Our concern is the believability, the robustness of the data, how believable is the result. That’s what we’re trying to get, and I look at randomization and blinding, really those are methods
to achieve believability, that’s what we’re using them for, not because we have some
requirement for those in statute or regulation it’s because we have those as ways that we can get
data that’s believable. Randomization is how we
assure balance at the start of the study and blinding
is how we assure balance after a study is randomized. And if you think about
it in that framework and we’re not gonna talk
about randomization to that, I think it’s great that we’re
having a discussion about how do we do randomized trials
because obviously we have had a lot of concerns about
the ways we can draw causal inference without randomization
and that’s a whole other work stream and many
other days of conferences. So we’ll focus on blinding. And I will say that this is
a bit of a broader issue, because and this was brought up before, that not all trials can be blinded and even trials in which
we think we’re blinding, blinding can be imperfect. So we still have to think
about what the influence is of unblinding, whether intentional
because that’s the design or unintentional because you can’t
fully blind the medication. What are the influence, what
are outcomes on a trial, and that’s what we really
have to understand. The end of the day, what
we’re really trying to do with all of these approaches is to
isolate the effect of the drug from all of the other
influences that can occur and can confound our ability to interpret the results of trials. What are the ways that
unblinding can influence, can cause us not to be
able to have very clearly interpretable data and we’ve
heard about some of these already so I’m not gonna say anything that that hasn’t already been nicely outlined. Obviously subjective
endpoints are of concern. Patients know what they’re on, their response to PROs or
other subjective endpoints or even endpoints like walk tests can be very much influenced by their motivation and what they think the
results should look like. They may well respond in ways that are not just due to the drug. I’m perhaps not as sanguine as
Nancy is about these effects being transient as I think
they will sustain these effects that people have when
they understand what drug they’re on and they have an
expectation for the effect. Certainly and it’s not just
what people think about in terms of their thinking as
it influences their response but how they behave. They’re common interventions. What they’re taking, what their diet is. What their activity is. Those are very much influenced
by what they’ve been randomized to if they know
what their treatment is. And those differences between
the groups can certainly influence outcomes in some
settings and some trials, they may not have a strong influence. It certainly can influence
continuation in trials. I mean, the most glaring example
is when there’s a big news story about a particular drug,
or new medical information that we know patients hear
through social media hear within minutes of a publication,
of a New England Journal, or since we as here have
JAMA Internal Medicine. We know that those can
certainly influence behavior and the dropout rates can be dramatic. You may recall a trial
with one of the (mumbles) relative to another way, there was a story about
one of these agents and I think within
weeks, 40% of the trial, the patients in the trial
had switched therapies, was an unblinded trial. These kinds of things can
have dramatic and devastating effects on our ability to
make anything out of a trial when we can’t protect the
trial, the trial’s integrity because of the fact that it’s unblinded. I’d also point out that
reporting is very much an issue. We’re not just looking at effectiveness, we’re interested in safety. And when a patient knows what they’re on, their reporting collection of information is going potentially to be different. We say well we’re talking
about serious adverse events or hospitalizations. Yes certain types of events will not be differently collected. But in a real-world trial, when there’s very intermittent
visits for the patient, will we collect all that
information, or health care system, particularly in the U.S. can
be somewhat fractionated. So are we sure that we’re really actually collecting all of the information,
even for serious events when the patient knows what
they’re on and may report differently in one treatment
group versus the other. So when can we think
about unblinded trials. And again, to be practical,
we have to be able to accept unblinded trials or there’d
be certain drugs that simply couldn’t be developed. So without question, we have
to deal with unblinding , whether we do it in a real-world setting or even in traditional settings. Well certainly hard out points versus subjective down points can be very useful. As I pointed out that may help
examination of the endpoint but it doesn’t change the
fact that can common behaviors and other influences can
certainly have an effect on the response to the drug. Certainly when there’s
therapeutic equal points, if we’re comparing two drugs
where there’s no expectation of a difference of response or safety. That can be a helpful circumstance
where we expect patients and physicians not to behave
particularly differently, and so there we might
believe that the differences were just simply the effect of the drug. Another situation can be where there’s really no expectation of benefit. I was talking to someone
a couple of years ago, I was doing an unblinded
trial of Allopurinol versus no therapy in looking at outcomes. And while there was a lot of
prior data there was really not much consensus or expectation
of a difference of benefit. Perhaps there is a situation
where we wouldn’t necessarily expect differences in behavior
in the treatment of patients with asymptomatic hyperuricemia. Where there’s a large effect size, we certainly might think that
it won’t be confounded by these behavioral changes or
changes in the trial conduct continuation of patients within the trial. Well even that I think is a question. And I guess where I would come back to is, it really becomes in my mind
an issue of the believability and the robustness of the data. And I think when you have
unblinded trials, intentionally, or because the drug has a characteristics, or allow patients and physicians to be aware of what the patient is on, we have to think very
carefully and very thoroughly about what differences
can creep into the trial and in our interpretation,
really take those into account. We heard about adjusted analysis and doing sensitivity analysis. I would ask, adjust for what, adjust how? And how much can you, you know, when you’re trying to take
data that is not robust, and adjust it to robustness, that’s where I think, we tend
to get a little bit nervous. And I’m not saying that those
aren’t appropriate but I think thinking ahead in the design
and trying to minimize differences between groups
becomes absolutely essential. But again, I think we do also
have to be pragmatic here, again, not with regard to
trials but just in general, about how we can analyze data. And so I think unblinded
trials can be done. I think we would be
accepting of those designs, but I think it really has to
be very carefully considered, and in very specific settings
where these influences are not so unbalanced
that we can’t believe the response at the end of the day. Thank you.
– Okay, great. (audience applauding) Thanks Peter and thanks to
all of the panelists for just a really good set of comments. And I think a lot of strong themes came up and I really appreciate Rita, you’re sort of bringing up this
power of the placebo effect. And that seems to be very strong. And I think some of the other presenters, and Simon, your study and
some of the other things that Nancy and Peter
were talking about are not necessarily
placebo-controlled studies, but studies in which the
intervention is being compared to usual care and have
brought up what, you know, I heard one thing that was consistent I think among was that this
objectivity of the outcome matters and if it’s a
more objective outcome then you may not be as concerned, or if you can’t do a blinded study, if you have an objective
outcome that’s somehow better than a more subjective
outcome, where you know, obviously it depends on
patient’s perceptions. That seemed to be, but I’m gonna turn it to all
of you to see if I’m right, there seemed to be some
consistency among the panelists that the objectivity
of the outcome matters. I also heard this question
of durability and I you know, I don’t know if there’s ever
been any study to know if like, this effective on blinding this you know, power of positive thinking is related to durability of the effect, but Nancy, it made sense what you said, but, so I’ll believe it, I don’t know.
(audience laughs) But and you know, I’ll ask the rest of the
panelists to weigh in on that, the objectivity of the
outcome, how does that matter, and the durability of effect, and is that, should that be looked at? Anybody wanna jump in on that? Rita? – It’s interesting, your comment on that because I kind of went
two ways on that point. Certainly from a
physician in adjudication, you know out of outcome is much stronger. But from the patient point of view, I think Nancy and both
Peter alluded to the fact that you know, there’s such a
powerful mind-body interaction and you know, I think nancy’s
example of how, you know, knowing what group you’re
in can influence your HBA1C. You know, the simplicity
trial with a studied renal intervention and the
endpoint was blood pressure and then there was a
placebo arm to the trial, well in the placebo, the blood
pressure dropped, you know. So I feel if a patient
is unblinded to their, that it doesn’t matter how
objective the outcome is ’cause it could just change their behavior and that mind-body interaction that we don’t really understand
but we see it a lot. – [Greg] Yeah and any other? – Just thinking about Rita’s comment, I wonder if we’re getting
into the question of you know, what’s a clinically meaningful difference because we’ve been talking
about real-world outcomes and practical outcomes
that matter to patients, and to clinicians. And it’s always puzzling on
the patient’s side of saying, oh, your blood pressure went down, it went down, you know, four points. So I think we need to keep
looking at these endpoints and seeing how much of a
difference and is that something that people can interpret, because it’s a… We haven’t talked much about
the patient point of view. And the idea that you come
to the doctor and you know, Dr. Stein tells me, well, I don’t really know, I
could give you this or that, there’s clinical equipoise, well that’s not what I want
to hear from my doctor. I want to hear, I know
what’s right for you. I realize that’s naive
but that’s what you want. And then the idea that,
and then this morning we saw a trial with a
20-year follow-up period. I mean, there are a lot of
longterm questions we want. And the idea that, and I know this is off point, but that we would spend
that much money to make a small difference over time
in a practical question. I’m not sure that’s worth it. – Okay so the other thing
that didn’t really come up and you know, in this
discussion so far as much, is this question of, okay so you have, you know, everything that we talked
about kind of applies to traditional randomized
control trials, too. Like this effective blinding. But you know, specifically
when we’re in the, you know, the products are
already on the markets, or the intervention
drug and the usual care, everything’s already on the market. We’re collecting outcomes. They might be subjective or objective, but they’re still outcomes
that are coming in measured from real-world data sources. Does that by itself sort
of have an impact on, oh, well you really better
lie in this situation versus you probably
wouldn’t have needed to. So does the independent effect of doing like a pragmatic real-world
evidence study matter in terms of whether or
not you blind or, so? Question to the panelist. Because okay, I’ll just keep talking. (everyone laughs)
Because I would say that, Peter you went through a really good list of things related to unblinding, it was hard endpoints,
therapeutic equipoise, no expectation of benefit,
large effect size, I mean, those would apply in
the traditional clinical trial, or in the real-world
evidence world of things. So is there an issue with that? – Yeah, I’ll even answer
your prior question, then get to that one. You know, I think the question in, clearly there’s a lot of potential value in being able to use real-world
data in randomized trials, and I think we’ll have a lot
more discussion about how that might be done and the ways that we can make this as robust as possible. You know, and the practical
issue is that in trying to use real-world settings, blinding
can be very challenging and we’ll have more
discussions about that, but it obviously adds a lot of costs. It makes it, potentially
infeasible for certain sites to even have appropriate
control of drug that would allow for a blinded trial. So there are real issues there. I think again, that’s
where one has to step back and really think about
what the influence would be if you have an unblinded trial and what kind of data
that you’re collecting. If we’re talking about hard outcomes, so a trial looking at, for example, see the outcome endpoints when looking at myocardial infarction or
if you have the appropriate ability to pull in data on looking at cardiovascular or all-cause mortality, and stroke, so hard endpoints, there may be ways that could be a very robust set of outcomes. Again, you do have to try
to work hard to assure that you’re not gonna get
markedly differential treatment because of the unblinded
nature of the trial. But I think there are many interventions where that certainly can be done. I mentioned issue of an allopurinol in asymptomatic hyperuricemic
patients where there’s not really an expectation
of differential behavior. People aren’t gonna run out and try to take a uric-acid
lowering drug necessarily. A large trial, for example, of VA looking at the hydroclivizor
versus chlorofalodom, where there’s not expectations that there’d be a difference in the response. One would then think those kinds
of differential comparisons would probably be pretty robust and resistant to the influences
of a lack of blinding. But I think in each instance, you really have to think about
what the impact of blinding or not blinding is on the outcome, and whether behavioral changes
and persistence on therapy, which can be impacted by lack of blinding, how those will impact the outcome. Will it, won’t it? If it doesn’t, if we
can really believe that it doesn’t affect it, and
have evidence of that, then I think that it makes the
results much more believable. – Thank you. So I have one more question
before we get to the panelists, and it’s something that we
haven’t talked about yet, but I do wanna push a little bit on this, because it has come up in
other forums with regard, more, it comes up more often in the observational real-world evidence stuff, but it’s this blinding of the analyst. So far we’ve been talking about blinding of treatment assignment. Now the question is, regardless of that, for the actual analyst who’s using and, even if you have a randomized study, you’re collecting real-world data, your outcome might be measured through real-world data sources. And so you have the analyst
who’s collecting that data and coding it and all of that stuff, should that analyst be blinded to the, sort of a treatment group when they’re looking at those particular outcomes. So maybe Satrajit, I’ll turn it to you, since you focused in on the
statistical perspective. – So yes from my perspective the analyst, because he will be analyzing the data, so you need to be blinded in order to have any kind of a bias to bring you to the analysis that… (trails off) So I think that if you
think about our evidence which is solid, I think, blinding off an analyst
is very important, too. – Any other, okay Nancy.
– I might add to that, when we heard Satrajit speak who works for a pharmaceutical company, and I think we could assume that the, or the assumption is that the analyst has a stake in the outcome, in the endpoint, right, and that they wanna show that their treatment is better than another one. I work for a clinical
research organization. And you could argue that our statisticians wanna protect the sponsor, but you can get very
convoluted in those arguments and my, close to 40 years, experience is, the analysts, they just wanna follow the statistical analysis plan. They’re not invested in the results. So the question of whether
it’s an independent analyst, or not, and I have no
objection to the blinding. I just haven’t ever yet
seen something where, when you have no economic
interest in the outcome of study, whether you go to that, if you have a statistical
analysis plan that you agree to the outset, you just execute it. – I would just add that, you had some, I think very
interesting presentations earlier about the role of adjudication, I think for very common events, I think we heard some
pretty convincing evidence that adjudication may
not always be necessary. It may enhance precision but not outcome. I do mind you, think there’s
been data for many decades that what we consider to be a
very objective endpoints are very much influenced
by the reader’s perception of what the disease is. So when you, for example, put a diagnosis code for a
radiologist reading an MRI, it influences the results of the MRI. That’s a hard objective endpoint. And I’d say adjudication, we’ve all seen the adjudication
packages that come through for see the outcome trials, many of them are black and
white and some of them are gray, and I think there is clearly a
risk that an unblinded reader will read something in a way that’s different than a blinded reader. And I think again, if you are
talking about a very large set of events you know, it may
be a more sensible decision, simply not to adjudicate, than to adjudicate in an
open-label sort of way. If we’re talking about
rarer events, I think that, blinding the adjudicator
is absolutely essential. – [Greg] Satrajit, did
you have one more comment? – Yeah, I mean, definitely. I hear what Nancy’s mentioning, but I think it’s still very important, because we’re talking about
often confirmatory evidences. So it’s very important
to keep things ready and the practice, and especially also, when interim analysis comes in, sometimes we have seem
that, right in this setting. And I think it, as much
possible, if we keep that into… I think that the belief and
the robustness of the data would be more applicable to people. – Okay.
– Great. So we’re gonna go ahead, and turn to our audience for questions, and comments at either of
the microphones would… Jesse if you go over there,
you might get called on sooner. (audience laughs) ‘Cause I go back and forth. (laughs) Ellis, do you want? – [Ellis] Hi, I’m Ellis Unger, from Office of Drug Evaluation-I,
Office of New Drugs, FDA. In the government.
(audience quietly laughs) I wanna make a comment, I wanna direct it to Nancy. So we were talking about objective, and subjective endpoints, hard endpoints, softer endpoints and you made
the point that if you have persistence of effect in
a subjective endpoint, it’s probably not, you call it placebos, I call it expectation bias, because of persists. But there is a great example of a situation not many people know about, so I’m gonna explain it, where this was shown to be problematic. So back in the mid-1990s, we were, in the cardiology community
we were searching for ways to help patients
with intractable angina, who weren’t candidates for bypass or PCI. And out came this great idea of trans myocardial lazer revascularization. So you did Authoriconomy, you took a lazer and you
burned several-thousand holes through the heart with a
lazer with the idea that, this was actually done, people are aghast, and the blood would percolate you know, through the myocardium from
the inside to the outside and angiogenesis would occur, and new blood vessels would form. This was, this device was cleared, or these devices were cleared. And patients got remarkable
improvement in their angina. I mean, remarkable, a couple of functional classes. And the cardiovascular community said, well I don’t know if this works or not… It’s, there’s clearly
expectation bias here because people are
definitely doing better. But I can’t believe it’s anything other than expectation bias. But it turned out that
when you look longterm, these patients got benefit past a year. And everybody said, okay. Well it can’t be expectation bias, along with what you said, Nancy, ’cause how could you possibly get such prolonged improvement,
and then it was possible, it became possible to do
this with a sham control, because the catheters were
developed that would deliver the lazer energy from inside the heart. So you could take a person to a CATH lab, do the lazer from inside
and do a sham procedure, and Morgon Leon directed
such a study called, “The Direct Study”, and there was absolutely no
difference in angina relief. There were way more adverse
events with lazers than without. But it was, it’s worth
thinking about that example. ‘Cause most people, I
think would agree with you. I would’ve agreed with you 20-years ago, but the direct trial kind of you know, throws some cold water on that concept. – [Nancy] Well certainly a good example of the power of positive thinking and– (audience laughing) – [Ellis] Yeah, yeah
it’s expectation bias. – [Nancy] But it’s all about, you know, a context-specific issues and your point, I think speaks to the fact that we don’t know what we don’t know, and that’s true but the, it’s quite an unusual story. – [Ellis] Yeah, I think my own view is, the more desperate the patient, and the more gee wiz the procedure, the stronger the expectation-wise. (audience laughing)
You’ve got cells, genes, lazers, maybe throw in some
artificial intelligence. (audience laughs)
You’ll get your expectation. – Yeah, thank you.
– Yeah. – [Ellis] You want that. And then when you do authoraconomy, you really want it to work
’cause they invest in it. – [Rita] So you’re arguing
for placebo controls? – [Ellis] Oh yeah, I won’t call it… I would call it a sham, yeah. – [Simon] Can I also
just make the comment… I think it’s important to
highlight that I don’t believe these pragmatic trials
should be seen in isolation. I mean, these are not to
replace explanatorized RCTs, these are to complement. So I don’t think we’re talking about replacing explanatory RCTs, but if you have explanatory RCTs, and you do some pragmatic trials or trials in a health care setting
with a randomization proned and see similar results, I would argue that’s fairly convincing. – [Greg] Yeah, yeah. Okay, great. Jesse? – [Jesse] I think you’re
alternating sequence here again. – I am, yeah.
– Thanks for the advice. – Yeah. (audience laughs)
– I got more stats. – [Greg] I knew what was gonna happen– (person murmuring)
– Jesse Berlin, Jesse Berlin, Johnson & Johnson. So, as long as we’re telling stories, and I wish I had a publication to cite, but one of the people on my doctoral committee was Tom Jahmers who
is kind of one of the fathers of the modern clinical trial. And he told a story which
they never published, that they did a study where they took a bunch of statisticians, they generated data, which were known. So no difference between
the treatment groups. They gave half the
statisticians, they said, group A is active, group B is control, and the other half,
they switched the roles. And there was a tendency to find in favor of the active treatment, no matter which one they
called the active treatment, when in truth, there was no
difference between the groups. (Greg murmurs quietly)
So, it’s anecdotal evidence, but, you know.
– Great, thank you. Go ahead–
– Prespecified analysis plans. (audience chuckles)
– Yeah. – [Cindy] Cindy Germin
from SIRUPS Consulting. Thank you for the panel for
some really thought-provoking comments, I think to me it
all comes down to the research question that we’re trying to address and I think we heard that
this morning, as well, in the fact that to me there
is no one-size-fits-all, you either blind
everybody, triple-blind it, or you don’t blind at all. There’s a lot of middle ground in-between. There’s a lot of ways to
blind and reduce bias without having that triple-blinded clinical trial. And if we’re trying to answer a question in the real-world setting, you know, does this drug work versus
standard of care, or whatever… If you go into a physician’s office, they’re gonna give you a prescription. They’re not gonna blind
you to that prescription. So if your behavior changes, you know, it’s gonna change in
the real-world, right. So if we’re trying to address
a real-world question, it seems like we should try to address a real-world question. And I would say, just to echo something that Nancy said as well, there are ways to use
quantitative bias analysis and other types of analyses. I know Peter, you said, can’t really make it
robust if it’s not robust but there are ways to estimate the bias. And I think we should be thinking about, will it change the results? Do you have a large enough effect size, and large enough sample size, to where it wouldn’t change the results one way or the other. And if it wouldn’t, we
probably don’t need to blind. It is expensive, it is costly and complex. If it will, if you have a
small enough effect size, you know, maybe you should blind, or maybe you should be thinking
about a different design. – [Greg] Thank you. – And I could just, couple comments, I mean, I think you
raise a very fair point. And sometimes our trials are
designed to try to figure out what is happening in a real-world setting that may be different from a regulatory decision that needs to be made. I think in the context though, we have to recognize that the interaction in a physician’s office and
how a patient is interacting with his physician is very
different from the setting in a clinical trial in many
ways in terms of what we’re trying to achieve and the
influence the physician has can be very beneficial. That interaction, both the
medication and the interaction we know has a beneficial effect. The question is, is it going to be a differential
effect when you’re having a trial and will that
affect me more in one arm than it is in the other,
that’s the problem, not so much the issue of
whether or not there is such an effect and whether it’s important in the clinical paradigm
where it is of important. And so I don’t disagree with you. In a clinical setting that’s
part of the therapeutic benefit is that interaction with the patient. But when we’re trying to answer a question we want to isolate the effect of the drug and from that therapeutic interaction, unless that’s the (mic thuds)
question that you’re asking, and sometimes it is, and sometimes that can be quite relevant. – [Greg] Bob? – [Bob] I guess one observation is that most effect sizes are small. It’s too bad, but it’s usually true. I have a word complaint. These trials are all being described as, drug versus standard of
care, that is incorrect. The patients in both groups
are getting standard of care and you’re adding a drug to one of them and comparing it with no addition. It’s not placebo control ’cause
you haven’t used a placebo. But these are not comparisons
with standard of care, these are everybody on standard of care with and without drug. So it’s drug versus no drug. Not standard of care.
– Well, wait, wait. – No.
– But, hold on for a sec. You say that so glibly. My understanding, and what we
see in our pragmatic trials is you assign, treat… (stammers) You have a condition
of clinical equipoise, or treatment equipoise and
you assign a treatment, and then you, the patient
either gets that treatment. And I agree that standard of
care is a bit of misnomer, ’cause there’s no standard, it’s just whatever the doctor
would’ve prescribed otherwise. – [Bob] Aren’t the people
getting the new drug also on standard of care? – They are–
– I’ll bet they are, I’ll bet they are in this one. – Sir–
– They’re not getting an experimental drug or (stammers), they’re not the focus of the question. – [Bob] In most of these
kinds of things you’re adding either a drug or no drug
to the standard of care, and everybody gets standard of care. – [Simon] So in this
specific separate trial, eligible patients are patients who are in need for further identification. So if they decide to participate, they will be randomized
to either semaglutide, or something else. So they will add something
else (Bob interjecting) and that’s what we call
standard of care, so– – [Nancy] So new drug
versus a new treatment. You know, what.
– So in that one, if they didn’t get liraglutide, they got something in addition. – [Simon] They could get
liraglutide if they wanted that. Yeah, in the standard of care. Any other drug than semaglutide. So they are… Both arms are escalating, you could say. – [Bob] I see. So why do you expect to see a difference? – [Simon] Um, that’s– – [Bob] You don’t supposed
the one group gets randomized to liraglutide or takes liraglutide, then you won’t see a difference. – [Simon] We believe so, based on the effect size of semaglutide. – [Bob] I see. Well, in a lot of these things, it is drug added standard of care versus standard of care alone. There’s no placebo but that’s what it would be in a placebo control trial. This is just a no-treatment trial. I had one other question. I didn’t… I came a few minutes late,
so I might’ve missed it, but what was the purpose
of SEPRA trial to do? You already knew that the
drug affects glycemic control. (Greg coughs)
Was it to look at these other things like health resources, and other systems and (mumbles) I don’t quite understand
the purpose of the study. – [Simon] Yeah, so, so– – [Bob] When you already knew what it did. – We know the effect of a
semaglutide when you studied in a highly-controlled,
somewhat artificial setting, where drug is provided for free, and you have frequent trial visits. It’s back to why we
wanted to do this trial. We wanted to see how this drug fares, in terms of true effectiveness
when you introduce it into a setting in which
it’s intended to be used when it’s on the market. That’s one answer. The other answer is that
in this trial concept, we’re able to look into
other novel endpoints, health care resocialization,
assurance persistence, when you don’t get the drug for free, so stuff like that. So there was a key to you could say, objectives for doing this trial. – [Bob] Okay, but you’re only
gonna see a benefit if the effect in the treated group is
different from the other one. So the other ones better not
take something that works just as well or you can’t
possibly see a difference. – [Simon] That’s the real-world, right? (audience laughs) – I know it’s the real-world.
– But that’s the risk– – [Bob] But you already
know the answer to that. If somebody, if you compare
one drug with another drug with the same effect you’re not
gonna see a difference, duh. – So (chuckles) but they are
taking a different drug, right? So and you can say, I
would argue we don’t know, that’s why we’re doing the trial. Nobody has done this before.
– So it’s really a compare semaglutide with anything else? – Yeah.
– Seems like a washout. (audience laughs)
– Thank you. Adrian?
– All right. It’s always good to hear Bob
being continuously optimistic. (audience laughs) So this actually, Nancy kind of touched
on this a little bit, but I wanna get the question
out there in terms of the value of information here and so
maybe specifically for Simon or Nancy, like, in terms of, we talk about blinding as like, you know, it’s actually simple. Like, we have infinite
resources, et cetera, and it seems like that’s
not necessarily the case. So can you describe what
is actually the level of cost difference between what you did when designing SEPRA versus if you had done the same objectives and hadn’t blinded in terms of either total cost change
or time, because I think, that’s one of the key
questions that comes up. In a perfect world, without
any limits on resources, sure this would be terrific, but it does limit how many
other questions we can answer. And then also, in the
context you know, certain, blinding may be really
important in early development, but later in lifecycle, the
safety profile is pretty well established and so, like you know, it’s kind of chewing on the ends. So I don’t know, Simon,
what would be the cost? Did you guys ever think about that? – [Simon] We didn’t think about that. That was not the deciding
factor, actually. I’m sure, it would be much more costly– – [Adrian] Could you give
a, I guess like, I mean– – Let me make an–
– 10% or 20%? – [Nancy] Let me give Simon a
hand here, if you don’t mind. I work on the SRO side, we do
this question all the time, and people come to us late, I’m glad you brought up the
point about label-expansion versus new approval.
– Yeah. – ‘Cause I think that’s an
important part of the context. You have a lot more information, it influences your tolerance for risk. But what we see is when
clients come to us and say, I wanna do what I call a classical double-blind
randomized control trial, and then they see the price and they go, I don’t know, what else can we do, and how can you make it real-world? (audience chuckles)
What we’ve seen time and again is the
cost is roughly half. Half, so it’s a big number
and that’s not doing any fancy Adaptable stuff.
– Right. – You know, that’s a classical,
still they come to the site, they get the treatment, they come to the site
for all the follow-up. So I think that cost
implications are large, and it’s a half or less. I think there’s also a
generalizability point that we’ve talked about
a little bit in here and the sites that you
go to who can store drug, monitor drug, account for the drug supply, have the staff for that heavy
infrastructure are not… I think what you see, and I’m guessing, but what you see, that for
the benefits of the treatment, in such a high, such a fancy
setting with so much equipment and skills and monitoring is different than when you go to the community center. And where, a lot of the
questions we’re asking about comparative effectiveness is, how does it work for the rest of us? For the complex patients and the patients who don’t
go to the big, big centers. So I think you lose generalizability and it certainly costs a lot more. Not saying it’s an answer for everything, but it is the answer for some.
– I mean, it’s costing time. And just to underscore that
kind of generalizability, like at Duke, if it’s a blind study, like for where the people have to go to get the investigational drug, it’s literally in a basement, in another building,
away from the hospital. If it’s open then it can
actually be done right there, and so in certain settings
that could be really important. It actually speeds everything up. – Okay.
– Can I just make a, a probably a somewhat
controversial comment? This issue of generalizability, I think that one of the
things that’s worth studying is this issue of generalizability. There’s this sort of understanding,
I think in the community that randomized control trials
that are traditional somehow aren’t artifice, artificial, and the real-world trials
are the real-world, that’s the accurate answer in terms of what you’ll see in practice. I’m not sure there’s much
evidence that is robust that actually speaks to that. Traditional randomized control trials I think can be very
costly, difficult to do, and I think the advantage
of real-world trials is they can be larger, they
can be much less costly, as long as they are sufficiently
robust to get any answers. But the idea that traditional
trials don’t give you the right answer, that
it’s an inaccurate answer, because it’s not generalizable. I would push back on that. I think the evidence that
it’s not correct is limited. And I think the biggest
difference that I’ve seen when they’ve been compared, is adherence. And I think that is an
important point that can be a relatively important
thing to investigate, differences in adherence, for example, for a subcutaneous or the advantage of a monthly versus a daily. Those are relevant questions to ask. Those are different questions
than, does the drug work. One thing that Bob said earlier, which I think is an important point, if you don’t take the
drug, it doesn’t work, that is evident, but that
is the biggest influence when you go to a real-world setting. Now does it not work in
renal failure patients or in patients with sorosis or in patients with
different stages of disease, those are issues that should
be examined very carefully. But I just wanna point out
that generally speaking, I think that traditional
trials do give you an answer that is in fact generalizable. I think there’s little evidence
that traditional trials, in fact are you know,
inaccurate in assessing, they have great internal validity, but poor external validity. I think that’s a research question, which has not been addressed, I think it’s worth addressing. But I think right now,
it’s more of a hypothesis. And I’m afraid to say that
I think in this community, it’s more than hypothesis, it’s almost an assumption and
an unproven accepted fact. I think I would challenge
the community to say, if you really believe real-world
evidence trials that are properly conducted will give
you a different answer other than through adherence, I
think it should be studied, rather than just simply perpetuated. Because I think it’s
not well-demonstrated. – [Adrian] I would agree
with that assertion– – I told you that I’d be
kind of controversial. – [Adrian] So it’s
unknown in terms of like, certainly you know the efficacy trials, they generate a robust
answer and so it’s you know, very clear what that answer is. There are definitely questions in terms of other populations. So if you just look at baseline, typically patients who
we’re treating are older, or different and (mumbles), and so there are different
terms that we just don’t know– – Sure and when you look at subgroup, if you look at a forest plot subgroup, by age, renal function, all
of the things that typical in large outcome trials,
they look by a subgroup. I would ask you to look
at those carefully, because generally speaking, although not know is, the subgroup influences
are relatively modest and the best estimate of
the overall treatment size is the overall effect in the population. And other than random variations, we don’t tend to see huge effects. Now there are clearly substantial and important exceptions to that. Drugs that are, for
example, taken as (mumbles) to inhibitor, of course, we
understand that its effect is continued upon adequate renal
function and that’s evident. But the–
– I think we’re actually in agreement that we’d like
to go from two-percent of what we usually see in terms of
participating to send trials to something more so then it can be, sure. ‘Cause all of the subgroups are typically too small to actually
address those questions. – Sure.
– So, doing larger trials mean what Martin said this morning. – Okay. So thanks. So that is all the time
we have for this session. There will be an open comment period. We’re gonna go to the next session. So actually, first
there’s gonna be a break. Then you’re gonna come back here at 3:05, because that’s when we’re
gonna start this next session. And then right after that, we’ll
have a comment period, too. Thanks for a terrific discussion
on the considerations here. Thank you.
(audience applauding) Okay we’d like to get started.
(crowd chattering) So, let me ask just the
final panelists to join me up on on stage, and go ahead, and make your way to your seats, for this next session, thank you. (crowd chattering) That was quick, good, thank you! Okay in this last session
we’re gonna dive into a little, we’re gonna take a bit of a
turn away from the sort of, the operational challenges and
issues with pragmatic trials that we heard earlier this
morning and then we did dive into the sort of impact and
consequences and considerations related to blinding, now we’re
gonna turn to this topic of, okay, so what does all of
this have to do with our inferences that we actually
make on these treatment effects. For example, issues such as
real-world patterns of treatment including crossover, early
stopping, intermittent use, how do all of these
impact the trial analyses and our inferential statistics
that are calculated. We’ll be considering these issues during this particular panel and then, I do wanna remind you all
that right after this session, we do have an open
comment period, as well. So with that, I’d like
to introduce our panel. Kicking us off will be David Price, the Primary Care Respiratory
Society Professor of Primary Care Respiratory Medicine at the University of Aberdeen U.K., and Managing Director of The Observational and Pragmatic Research
Institute in Singapore, and Managing Director
of Optimum Patient Care in Australia and the U.K. Wow, you’re all over the place. (everyone laughs) And also, Vince Willey is joining me, Principal Scientist at HealthCore. Mark Levenson is Director of
the Division of Biometrics 7 at the Office of
Biostatistics and CDER FDA. Jesse Berlin, Vice President
of Epidemiology at J&J. And then Lisa LaVange is
Professor and Associate Chair of the Department of Biostatistics, Gillings School of Global Public Health at the University of North
Carolina at Chapel Hill. So I’ll turn things over to David. – [David] All right. – There.
– Thank you very much, and great honor and pleasure
to be here with you all today. You’ve heard about my
rather strange life already, so thank you for that. And I do actually genuinely
have a base in Singapore and Australia but I very
rarely go to Aberdeen, although we (stammers), you might ask why, or maybe why not. But we’ve been doing a lot
of real-world research, I guess for a long time. I think my first published paper with real-world data was 1995. So I think I’ve got the tee-shirt for the most wrongly-done studies,
(audience laughs) as well as, some of maybe hopefully, some of the more helpful studies. I was asked to originally talk about one recently published paper
in JAMA, the TWICS trial. But I thought as I was
exploring some of the issues for this session, that bringing two
other pragmatic trials that, one we’ve completed a long time ago, and one I’m about to commence. Well it brings up slightly
different issues about inference. Some of you may have seen
this paper in JAMA recently. It was a U.K. HTA-funded trial to look at the effect of theophylline in COPD. Theophylline’s a very old drug. Used in normal dosages it
improves patients a bit, but not at good as standard medicines, but it’s a pill and at
low dose was thought to potentially make a difference. So this trial was really
funded on the basis of cheap, old technology, could it
improve outcomes for COPD, which we know has bad outcomes. And this on top of usual
care and on top of basically all of the maximized
therapies that we would normally use for people who have bad COPD. So people with lots of
flareups of disease. And we actually randomized
them to theophylline placebo and people said, why are you randomizing
in this real-life design, well because we could. It wasn’t too expensive, it
cost us around 150,000-pounds, it’s not worth much in
U.S. dollars these days. And we were able to send
the drugs out by post and we minimized the study visits, so there’s a baseline of
six-months and a year, and how the outcomes were collected both from electronic heath records as well as, patient report at the end of the study. So this is a diagram we
drew up a few years ago. You know the (mumbles) of course, but this was the one where
we tried to think about the ecology of care versus the
type of patients in studies. And Jerry Christian who I
know is in the audience, was also one of the co-authors on this, where we tried to describe
how sure we were about the diagnosis through
to the ecology of care. And really, what we thought were TWICS was that we insisted they
have a good diagnosis of COPD. Not that was especially
(mumbles) but a good diagnosis. So, but fairly normally cared for, maybe a little better than normal, but so we put it in the middle there. And what did we find? Well I’ve never seen an odds
ratio come out like this or a hazard ratio, or a rate ratio, one-point-zero-zero,
adjusted null-point-99. It didn’t work. But there were a few interesting issues. I’ve said it was very pragmatic. We were using in a clinical scenario. We were lucky to be able to use placebo and there was no comparison to study because there isn’t a
comparator in these patients. It was well-covered and it
represented primary care, secondary care, minimal
inclusion criteria, apart from the fact these patients really needed extra medicines. However there’s some limitations. A quarter of the patients
stopped taking treatment. Why, because we did inform consent, and we gave them a lovely,
long leaflet that taught them all about the potential
side-effects of theophylline. So they all got gastric
upset and stopped it. So we actually had to over recruit, so we actually go back
and get extra money, and put a load more patients in. Now, is that the right thing to do? I don’t know. We also used patient
reported exacerbations, maybe there’s some bias, but
we did verify with the HRs. No true measures of adherence. So we certainly got no result. We believe and I think
we’re pretty certain there’s no result to be found there. But there are if they’re being borderline there would’ve been real question marks. What about some lessons from
some other studies, though? Some of you are probably
familiar with this trial that we published a few
years ago in the New England. Again, a U.K. government-funded
trial to look at who could try an antagonist, a pill for asthma, but in efficacy trials versus
inhaled steroids works badly. Well, it works but not as
well as inhaled steroids. And as a result in those
guidelines make statements like, this, inhaled steroids are
the most effective preventive drug for adults and older
children duh-duh-duh-duh, however one of the questions
that came up right at the end of the last session, was about, what about real-life interactions? Now I don’t believe that adherence is the only real-life interactive factor. And patients excluded from
classical asthma trials on a large number of grounds. In fact 98% of patients are excluded, 98%. So that’s on the grounds of smoking. More than 10-pack-years in their lives. Not having severe enough disease. Not having other (mumbles), particularly right at severe rhinitis. Not having something called reversibility, a big improvement in lung function. And the question is, does
that bias the results? And so one of the things we
set out to do in this trial, was to really broaden
the inclusion criteria. Try and study some of those subgroups, we didn’t have enough
money to do all of it, but to at least look at some of that. And we are comparing very
different technologies here. We’re talking about a pill
that requires no training. People know how to
swallow pills, generally. Higher adherence in
studies, it works quickly. It works pretty well
in nonallergic disease, treats the nose, as well as the lungs, and might be more effective in smokers. Whereas the inhaled steroids
taken by an inhaler, patients can’t or don’t
know how to use them. They don’t take them,
more gradual effects. Doesn’t work in the nose, so you got different technologies. So we believed it was right to
randomize patients to either receive a leukotriene antagonist, or an inhaled steroid
in the point of care. Patients with telephone
randomization, normal prescribing, reimbursing extra costs through the health care systems to the payers. And we set our primary
endpoint at two months and our final endpoints at two years. And what we basically saw, and this again is, I would put it here, because it was more of
a clinical diagnosis, less confirmed in the TWICS trial. And what would these patients look like versus standard RCTs out there? They’re a bit older, they have less severe disease, many more smokers, and the other thing is, we have very low dropouts,
four-percent over two years, versus other standard trials
around that time, over at 16%. And what did we find,
absolutely no difference. So is the meta-analyses wrong? Are the guidelines wrong? Well we actually found it was
because patients took more of the leukotriene antagonists, a pill. People also, who smoked, did better. Those with rhinitis, there were
some hints of doing better. So many real-life factors interacting. And I think those are
incredibly important to explore. The other problem that
occurred though, is crossover. And we managed to keep it
really clean for two months and then the problem occurred, is that you go back, you’re given a leukotriene
antagonist, a pill, not recognized by U.K.
Guidelines at that time, at least not in that position, and so you go to see a doctor
who’s a local in the practice, he says, what on Earth are
you doing on that medicine? You shouldn’t be on that! And switching them over. They weren’t switched
for any other reason, there was no greater exacerbations. So we obviously did two things, we did true ITT and a per
protocol to really try, and understand, also
getting into the subgroups. So these issues are really important when we think about inference. And I wanna finish with one last trial, which we’re just about
to start, we got funding. I had to spend a long time at
the Ethics Committee last week in the U.K. so I’ve flown all
around the world this week. Tried to discuss what level
of consent was appropriate. There’s loads of studies of
adherence devices out there that show adherence support,
and proves adherence, but never improves outcomes. So who’s gonna pay for it? Good question, eh? So we spent a long time
persuading a company to do a cluster randomized trial with us using an add-on the
propeller health provide on top of a standard inhaler. For people with COPD who have bad outcomes and poor adherence. And we’re doing this as a
cluster randomized trial, because our question is, it’s not, does this device help, that’s not our question. Our question is what is the impact in it, an enhanced adherence package. The right drugs, the right add-ons, where the practice becoming
more aware of adherence and starting to think about it more, patients becoming more aware. What does that do to outcomes? Both of those who take up the technology, but also for the broader
practice population who are also managed by the
same doctors and nurses. So that’s why we’ve gone for
a cluster randomized trial. And with using hard outcomes
of flareups of COPD. Now what makes this possible? Well we’re very lucky, we’ve created a network
over the last 15 years of 800 primary care sites in the U.K., on the back of quality
improvement programs, where we have access to
all of their EHR data. And we also have some
patient-reported information and willingness to
participate in research. So we actually can identify
the right patients for them and help them if they are
in the randomized arm. And what’s important for
me in terms of inference, and I think this slide, I think is my most
important slide, really, is what populations we’re looking at here. Yes, there’s a group of people
who take up this technology. And there’s a group in
the control practice, who would’ve taken it up, when we quiz ’em at the end of the year. That might be our primary population. But there’s also all those
other patients who flared up with poor adherence who
didn’t take up the tech, and then those with poor
adherence actually that were doing pretty well, most,
very smart of them, probably. And then there’s the other frequent needs exacerbating patients. What is the outcomes for
all of these populations? Not just the ones that
actually take up the tech. And hopefully we won’t end up with another negative trial
of adherence support. Thank you very much for listening. (audience applauding) And the reason I put a question slide up, it’s all about the question, which I’m delighted to
hear somebody else say– – [Greg] Okay, yep. Thank you, Vince Willey. – Great, thanks Greg. And I definitely have a better
appreciation for blinding now sitting in that seat with the lights going on.
(everyone laughs) – [Greg] Surely. – So when we sat down and
discussed what we wanted to talk about in this
particular panel session, you know, one of the things
we wanted to look at was the heterogeneity of the
populations that are recruited in real-world trials and
how that might impact, you know, causal inference
and what I thought I wanted to at least use as an example for
you today to kind of reflect actual patient populations and
the considerations at least that we have gone through
in this one particular trial in how to pick a study
population was to use the AIRWISE trial, which is a trial
that we’re doing in COPD. Real briefly, the AIRWISE
trial is planned enrollment of about 3,200 COPD patients. HealthCore, as a holy
subsidiary of Anthem, we have access to the Anthem claims data, so we are having a subset of
folks who we’ll get claims data on as well as some non-Anthem
health plan members. Patients are gonna be randomized either to dual bronchodilator therapy
or triple therapy which is dual bronchodilator therapy
plus an inhaled corticosteroid. It’s open-label, 12-month follow-up. There’s only two visits, so there’s a randomized visit, and then a forced visit
at the 12-month mark. Though if they were gonna
have that normally scheduled, then they would go as normal, usual care. Really trying to get it in
community-based physician sites and we haven’t talked
much about that today and probably a little out
of scope today but you know, really trying to broaden this, not only to research
sites and having patients that are, quote-unquote, “real-world”, but doing it in places that maybe aren’t akin to doing research, but where most of the
care in America occurs, and I’ll talk a little bit
about using administrative claims data to identify
those practices of patients. And then briefly, our analysis,
it is a non-inferiority, as you would expect, with the
design, as well as the primary endpoint time to first monitor
severe COPD exacerbation. So that’s the basis of the trial. When working with existing data, David talked about some
of his work in using EHRs. In this particular study, we’re leveraging administrative claims data to a large extent. So we helped in developing the protocol. Certainly we found some gaps in care that we thought that
the study could address. Looking at current treatment patterns and I’ll show you how that helped influence our inclusion criteria, as well as, sample size calculations, and ultimately what
our endpoint should be. You know, we looked at for instance, we modeled some of the COPD
exacerbations we were seeing in the population and used that to help inform our sample size calculation. We performed a protocol feasibility, you know we have some great ideas, but did we have enough patients in, and off sites to be able to pull it off. And then finally, we actually
used that data to help recruit and reach out to those sites, and help the sites identify
those particular patients. So what I wanna do is then show you, how we used a kind of a
preparatory retrospective study to help inform our design. So I come from a clinical
trials background. I did clinical trials for 10 years. I was the crazy clinical
pharmacist running through clinics trying to get patients and
working with the physicians, and then talking with the
patients and consenting them. And I did tons of respiratory studies, and David will tell you
every one of those studies, I learned how to do spirometry, I’ve down thousands of
spirometries in my lifetime. We did that with all
our respiratory studies. But we said, well what
goes on in the real-world when somebody has a COPD diagnosis, and what we found was 27%
of patients had a spirometry result within one year before
or after they got diagnosed. So we said, okay well, let’s
keep that into consideration. Also what we wanted to look at is well, what drugs are people on, and these work slides that I presented at the CHEST Annual Meeting
a couple of years ago, in a presentation of these results, and one of the things we looked at was different combination
therapies people were on. And basically no matter what
level of gold status you were, there’s the guidelines for COPD treatment, as far as combination therapies goes, ICS-LAVA was by far the
most common therapy, and LAMA on another type of therapy on the mono-therapy side. So we looked at that and said, okay that should help guide us, we’ll have enough of a population. So we had a lot of hard discussions when we got down to the
inclusion/exclusion criteria, and ended up, and I
had lots of discussions with potential investigators
on the pros and cons of this, but we did decide to
not require spirometry for the inclusion of the patients. What we said was, it was gonna be defined by the study physician, if
that study physician had spirometry results on hand and
that informed their inclusion criteria and informed that their
diagnosis of COPD, awesome. If not, then if that physician had them, and were treating them as a COPD patient, we did that and we felt
that was justified, based on what we were seeing
in the real-world patterns, and we wanted to replicate
it in those patients. In addition we decided,
based on what we had seen, as far as drug use goes within patients, that these would be the
appropriate therapies that people then would
be stepped up to one of the two arms that we
were randomizing them to. So trying to show you
here what we tried to do was use existing data to
be able to help identify the inclusion/exclusion criteria. The other thing I would note, when I present this
typically is this is all the inclusion/exclusion
criteria for the trial. You know, when I present
clinical trials typically, I would need four or five
slides and have to truncate, this is all that we have so again, a very heterogenous population. So as far as the implications on this, and implications on
causal effect, you know, certainly we talked to
some folks that brought up the potential watering down of effects. And Peter even brought
up the fact of you know, does that really matter,
is it all adherence or not, so you know, I do think that there is something to be said
for watering that down. But I also think that even more important is that we’re going to
be studying groups of patients who are not included
in the clinical trials. And as someone who did clinical
trials and had appropriately so, had a small cohort of patients who went from trial to trial, I was always hoping at the end of the day, gosh I hope they’re representative of everybody else in the U.S., because several drugs were
approved based on their results. So I do think it’s
important, as a you know, someone who was practiced clinically and worked inside a
patient center medical home and advised patients, and physicians, on what drugs to treat to be
able to have that information. The other thing and I think my fellow panel members are gonna bring it to, a lot of the discussions
we’ve had around AIRWISE, and other trials that we’ve
done have really talked a lot about what we do analytically,
as far as adherence goes, you know, crossover, early
stopping, intermittent use. You know doing analyses that look at both ITT, and per protocol, you know, those patients that they
had adhere to therapy. Other key aspects as far
as causal inference go, are missing data, and what do
you do around missing data, and certainly there’s experts here, that are much better than I, but those are lots of
discussions that we’ve had. So kind of, in summary, what I was hoping to try to accomplish with a real-world example
was to show you, you know, an approach that we used to
try to thoughtfully come up with what we thought was the
most appropriate population, much more heterogeneous than
would be in a traditional RCT, and then these other factors that may impact causal inference. So with that I’d like to turn it over to Greg to go to the next speaker. – Yeah, thanks, Vince.
– Thank you. (audience applauding) – [Greg] Turning over to Mark Levenson. – [Mark] Sorry, which is the forward? – The big green one.
– Okay. – [Greg] No the big one, big green one. – [Mark] Okay so I think
I have to start by saying, I’ve been wrong two ways
before I even start my talk. I thought by now, someone
would’ve said something, like a randomized trial becomes an observational study on day two. I don’t think anyone said explicitly that, but I think we’ve heard a
lot of things like that. The other thing I felt
wrong in my mind is, I thought this topic would kind of come out of nowhere from the rest of the day, but I think starting
from Bob Temple’s talk, we heard issues like ITT,
whether that’s relevant or not, and that’s very much what
this is going to be about. So I’m gonna return to that. So I mean, the basis behind
a statement like this is, you know, randomization is great, but right away, things happen,
like, patients drop out, patients have severe or adverse
events, they switch therapy, you lose ’em to follow-up,
you have missing data, all these things sort of
break the randomization, and the lessons we know
from observational studies, which are basically causal
inference can be applied here. That’s not to say that all the sudden, this is gonna become
an observational study, and we’re gonna, you know, cut
off all the issues with that. But there are some lessons about defining the causal question
that I think would help here, you know, clarify what
we’re trying to get at, and hopefully achieve that better. So I’m not here, well it’s
like Marc Anthony will say, I’m not here to praise
an observational study, and I think there is
something to learn from them. So first I’d like to start by showing some actual adherence in two studies, a traditional clinical trial
and an observational study, and I believe these real-world trials, the adherence may fall somewhere between. So don’t pay attention to
the Kaplan-Meier Curve here, what I’m interested in
is the number at risk. So this is a kind of a well-known study of dabigatran, a novel anti-coagulant versus the traditional warfarin anti-coagulant. This was one of the confirmatory trials to establish its evidence
for effectiveness. And then the bottom you
can see the patients, how many are around in
various follow-up points. So it starts about 6,000 patients for each of the three arms. You lose very few at six
months and at 12 months, you still have lost less than
five-percent of the patients. So this is an ITT analysis so
these patients are still being followed and, not necessarily on drug, but we can see on the next slide, that the patients are
pretty adherent in this. That after one year, generally about 15% of the patients are no longer on therapy. So I mean, these traditional
trials are designed to avoid all the issues that
mess up the randomization, including blinding as in the last session. But as well as making sure,
you follow the patients as well as hopefully encourage some way
of adherence to the therapy. So contrast this with a very high-quality observational study of
basically the same question. And again, I’m not focusing
on the outcomes here, I’m focusing on the number of risks. So this was a comparison of a number of novel anti-coagulants using CMS data, so the numbers are quite
large at times zero. But by six months we’ve already lost, I’m sorry, I get distracted, we’ve already lost most of
our patients by six months, and by a year we only have a
small fraction of the patients. I think, as I said earlier, I think in real-world trials
we’re gonna see something between what we saw in the
last couple slides and this pattern depending on probably
how real-world they are. But you get a question, would you run a non-inferiority
trial with a you know, adherence like this. Just an aside, you know,
there’s a lot of interest in why or why not observational studies agree with randomized trials
and I think you know, besides the obvious confounding the adherence is gonna be a big issue and that will continue to be an issue with real-world trials using real-world data. So we may not see them always
agree with the traditional trials for a number of
reasons including adherence. Okay so now I’m just gonna
spend the last two slides, as I said, to try to define
what we’re estimating at, what the causal question is. And I’m going to use this
framework that’s in this ICH, this international guideline
for statistical principals, it’s based on this concept of estimands, so that’s what you’re estimating, now it’s not a very catchy word, but you don’t have to
worry about the estimands, that you just think about it, that’s what you’re trying to estimate here and this, I’d recommend you
read this as it’s a very, I think thoughtful document to explain, to specify what you’re trying to estimate and there’s basically
these four components: population, variable, intercurrent events, and I’m gonna go into that
in more detail in a moment, and the summary measure. Now you think the populations
a pretty straight-forward thing, well that’s just the
inclusion/exclusion criteria. Well that’s not always… Oh, sorry. (chuckles) That’s not always the case. For example when missing data you may not have follow-up on a
representative population. So let me speed up, so intercurrent events, those are events that
occur after randomization. So that’s the stopping therapy, crossover, and a serious adverse event. So these are things, and it’s
particularly missing data, too, that really might
mess up your randomization. So there’s a number of ways
that are discussed in this guideline of how to address
these intercurrent events. And they use different
terminology for various reasons, for example, everyone talks ITT, but it’ll actually, people
mean different things sometimes when they talk about ITT and there’s really no such thing as that when you have unnecessarily missing data. But the treatment policy is
basically what we think of ITT, well whatever happens to these patients, that’s what we care about. You know, if they stop therapy two months into a two year trial, that’s fine, that’s what the real-world’s about. There was an interesting
paper by Miguel Hernon that argued against that strategy
for obvious reasons like, you know, you might by diluting
the effect or for example, if you know, half your patients drop out for a serious adverse
event and well (chuckles), if a number of your patients drop out for a serious adverse
event and they’re you know, is that like your final efficacy really representative of the
outcome of the trial. So even, and this is what Bob
Temple started the day with if no one’s actually taking the drug, is this really what you’re interested in. And I was afraid this would
be a controversial topic but it seems like a number
of people during the day felt maybe it’s ITT is
not the right approach. So that brings us to this
per protocol approach. Per protocol you know, this
is while people are on drug. But unfortunately it becomes a little more complicated than that. Just censoring people
when they stop therapy and then doing the analysis on that data, may not get you the right answer, because the people who are censored, the people who stop therapy,
the people who switch therapy, they may be different from other, from the people who don’t. And that’s where all this
causal inference comes in, and I think things like
instrumental variables, I think Jesse will talk a
little bit about that’s one solution to getting this
per protocol analysis, but it’s not straight-forward, and it often involves some assumptions. And just to mention one last
strategy that’s mentioned in this guideline and that’s
the composite endpoint. So for example, you might
define your endpoint as like, an MI or stopping therapy, so you account for both
the adherence problem and sort of the clinical outcome. So perhaps we might discuss
more of this in the discussion but that’s basically what I have to say. So thank you.
– Great. (audience applauding) – [Greg] Turn to Jesse. – Do I have slides? (man murmurs quietly in response) Okay. See, now every time I come
to one of these, I think, I should really print
out a copy of my slides, so I’m standing up here in front
of everybody with no notes. (audience laughs)
So here I am standing… Oh, there we go. Okay, you wanna go back, I wanna go back. (folks laughing) Okay, it’s causal inference,
not casual inference, I made that mistake.
(audience murmurs) So the first thing you’re gonna
learn is that I never work, you’ll see my slides and you’ll know that I never worked for Mackenzie. (everyone laughs) Sorry, had to get that one in. So I do, since we’ve been talking about this question of heterogeneity
and I just wanna spend a minute or two on this issue
of heterogenous populations. And the main point I wanna make is, how are you ever gonna know
if treatment effect varies across populations, subpopulations, if you don’t include
subpopulations in the study. So you know, maybe it’s not
the first study you wanna do, but at some point you do wanna get a sense for how treatment effect is gonna vary. So and that gets back to the
generalizability question also, ’cause you know, people use
that word generalizability, but you know, we’re pretty sure
that not everyone responds. And so we’re always kind
of faced with that question of where is treatment
gonna be most effective and who’s at risk for adverse events. And I do wanna make this suggestion of a kind of a compromise. And typically, we in our trials, we exclude certain populations, without getting into pregnant
women, let’s just say, people over-75, I was gonna say elderly, but as I get older, elderly
keeps getting older. (folks chuckling) But if you take people over-75, let’s say, who might be excluded from the
initial registration trials, then we go into postmarket
and we know nothing about how the drug’s gonna work, or what the adverse effects are
gonna be in that population. So the ethics of not including
those people, I would say, are pretty questionable, just ’cause, we have no information and you know, then now we’re in a truly
observational study. So my compromise suggestion is, maybe there is a way to
include a broader population. Even if we’re not gonna do
that as the primary analysis, what if we were to include a broader population in the early studies. The primary analysis is gonna focus on the narrow population that
we would’ve done, normally. But now we have some additional
randomized evidence in those populations that we otherwise
wouldn’t understand at all. And you know, the finance guys are gonna kill me for making that
suggestion but too bad. (audience laughs) So now what I wanna do is talk about this, it’s a paper and I’m not
suggesting this is the answer, there’s a lot of assumptions that go into these kinds of analyses. But it does get into this topic
of causal inference methods. And the reason I put this
up is to remind me to say that Yiting Wang is really kind of the brains behind the paper. So Yiting and Marsha Wilcox. So, if you have any technical
questions, don’t ask me. But this was published in
clinical trials a few years ago. And it’s trying to get at this issue of non-compliance and crossover. There’s a lot of literature in this area of causal inference. Various approaches around to controlling for confounding error,
addressing non-compliance. And the one that I’ll just mention, ’cause it’s, I’m told, my
experts tell me that this idea of principal stratification which is what we actually use in the paper, it’s principal, spelled
with an A-L, by the way… That there’s a parallel between that and instrumental variables, which is, Bill Crown, are you still around? ‘Cause, don’t kill me for saying
something stupid, you know. Okay, so.
(audience laughs) You can come up and correct me, but the principal behind
instrumental variables, it comes from economics
and a theory is that, it’s a variable that related to exposure, but not to outcome,
except through exposure. And if you take step
back, randomization is, so we’re talking about these
large randomized trials. Randomization is, it meets the, it’s the perfect instrumental variable, ’cause it completely determines
at least at the beginning, completely determines treatment, and it should not have
an effect on the outcome except if the treatment works
better than the control. So it’s really kind of an ideal setting. So what did we do? We did some simulations studies. The setting here was randomized trials for cardiovascular safety
for diabetes drugs. And you all know, I assume,
that the principal here nowadays is you have to
rule out and increase in cardiovascular risk of a
relative risk of one-point-three, that’s the regulatory guideline, and that’s a non-inferiority
trial for a safety endpoint, all right, and non-inferior
in one of the drafts of the study questions that the words non-inferiority were mentioned, so I thought I’d call that out. We generated hypothetical trials. These are gonna be big studies. Hypothetical trials of 10,000
total subjects randomly assigned to treatment or control. Treatment, we’re allowing
for treatment discontinuation and crossover and this principal
stratification idea puts people into these bins of
compliers, always takers… Compliers are people who take
what they’re assigned to, always takers or always gonna
take the active treatment no matter what you tell ’em to do. Never takers are never gonna take it, no matter what you tell ’em to do. And then there’s a group that
we left out, which are the, I forget what we call them, but they do the opposite of
what you tell them to do. (audience laughs) So we didn’t consider them. But one of the assumptions
we built into this analysis and there is some evidence
for this in the literature is that people who are poor
compliers are gonna do worse, no matter what drug they’re taking. And there’s a number of
examples of that happening. So we built that in with
varying levels of effect. Two minutes. So what did we find? The causal analysis will always, or almost always removes the
bias due to the crossover, there are a couple of small exceptions. Intent-to-treat is always, except when there’s no treatment effect,
Intent-to-treat is biased. When the upper bound estimates
from the Intent-to-treat analysis were greater
than one-point-three, the corresponding estimates
from the causal analysis were also greater than one-point-three, so we’re okay there. But the flip side is not true. You could have ITT analyses
where the upper bound is below one-point-three but the
corresponding causal analysis, the upper bound can be
above one-point-three, and that happened about 66% of the time. There’s an aside here
which I didn’t mention and I’ll run 10-seconds over, but for a safety question like this, if we screw up because of poor compliance for an efficacy question
then it’s our problem, we haven’t shown that the drug works. If we screw up because of
non-compliance on a safety question, then it’s
everybody else’s problem. If we miss a safety issue
because of lack of compliance, you know, that’s a bad thing. So the point of this is
to understand how bad could the safety problem be
under perfect compliance. So that’s what we learned, that the survival
analysis removes the bias, but there’s a cost and it’s a
theme I got onto this morning. The cost is when you start
applying these methods, you increase the variance of the estimate. So that’s where that difference
in upper bounds comes from. You know, you can have a
nice, tight conference enroll for the ITT, when you blow up you apply
the causal analysis and you, blow up the variance you
know, you’re fixing the bias. But you’re blowing up
the variance estimate, and so your upper bound goes above so. But the message here
which I said this morning, and I’ll just emphasize again is, you know, we dilute ourselves, that was David Madigan’s word, we dilute ourselves into thinking
we know more than we know. So part of the message
today is that we really need to be good, better about
acknowledging uncertainty in the analyses when it’s there. (papers rustling)
I think I’m probably done. All right, thanks.
(audience applauding) – [Greg] Thanks Jesse. So we’ll turn to Lisa.
– Yes. So, no button.
– There’s no light. It just works.
– Perfect. Okay, so in the nature of being a true panelist I’m sitting on the panel. Also watching you guys
struggle with the lights. (everyone laughs) I just was asked to close out this session with some comments and reactions. And this is a topic that
I am interested in for a number of reasons, not just
comparative effectiveness, but also my stint at the FDA, I was involved in the E9-dot
guidance that Mark eluded to and all the problems with
adherence, poor adherence, and missing data and what that can do to your randomized trial. And I think with pragmatic
trials, that may be the, at least in the top-two of
the things that can go wrong. But outside of this session, one of my, one of the favorite things I heard was from Steve
Piantadosi, earlier. I hadn’t seen him in a long time. Brilliant clinical trialist and your book is still one of my favorites. But the idea that we go
into the real-world trials and we think that we’ve got these big data sources, access to populations, and by definition we’re
gonna be more real-worldly, but if you filter down, and filter down, and filter down a very big number, and you still end up
with a very big number, it’s no more real-worldly
than a clinical trial if you’ve lost 80% of the people that you started with through your filters. And I think that idea of trying
to get the data structures and go to the results, back earlier, it’s your point
of contact is really quite good and something that we
should probably think more about, especially when I think of
the data structures people are building and the data
models, common data models with big data, and electronic
medical records are almost always the cornerstone of those data sets. But anyway back to this session, the, so Bob Temple of course,
aren’t you glad he spoke at the beginning of the day
and not the end of the day, so we wouldn’t end on
that note? (chuckles) Just kidding.
(audience laughs) But, Bob had some actually
great things to say, as he always does. And he called out this idea
of the ITT, the population, or the analysis method, whichever way you wanna go with that definition. And you know, the, when I, again, when I think about the
purpose of real-world data and using it to generate
real-world evidence when I hold onto randomization
in a pragmatic trial and try and sort of have
my cake and eat it, too, what’s called the treatment
policy estimands and ICHE9, it’s very much not called ITT for a reason I’ll say in a minute. It’s a treatment policy estimands. It seems philosophically to fit to me. You’re going into an
existing health care system. You’re starting the patients
on some path of treatment and then you’re gonna be
hands-off and see what happens, because you want to see
in a real-world setting. Like in the trial talked about earlier, the Novo Nordisk trial, where you can stop and start
other meds, we don’t care, the trialists don’t care. You get randomized and hands-off, let’s just see what happens, that’s our best estimate of real-world. Well that’s the spirit of a
treatment policy estimands, you randomize and you take the outcomes that you can get and you don’t really pay attention to the intercurrent events, which could be taking a risky med, crossing over to the other arm, because you wanna see what
happens in the real-world. So philosophically it’s lined up. It’s not called ITT and
E9 for a very good reason, because people use ITT in different ways. ITT, if you go back in the
literature was really intended to define a population,
not an analysis method. And it means everybody randomized, regardless of what they took. So it’s as randomized,
instead of as treated. Well how many papers do you read, especially in pragmatic
trials where they say, oh, we did an ITT analysis, but we lost 20% of our patients, but we still did an ITT analysis, well actually you didn’t.
– Yeah. – If you don’t have an
outcome on everybody, right, it’s not ITT. And that’s why E9 purposefully
didn’t use that language. I can’t say it’s ITT unless I
have an outcome on everybody. Now what happens then? Okay, well if I’m gonna be true to that, I’ll have to impute an outcome, if I’ve lost them or
if they’ve crossed over and I don’t wanna use the outcome I have, I have to impute something else. And that’s where things
can start to go wrong, and E9 talks about that a lot. There are good ways and there are bad ways to impute outcomes. A bad way to impute an
outcome would be imputing it as if the person didn’t drop out, or if the person didn’t
have a side-effect, or if the person didn’t
need to take a risky med, because there you’re just
assuming that they didn’t have to do that but they did have to do that. So that’s a parallel universe that I’m not that interested in, right. And we used to joke, Tom Permit
is the statistician in CDER, who was our point person
and lead and did a lot of the writing on E9 as well
as the strategic thinking, and we used to joke, that if
we followed that imputation method and called it ITT,
that we would end up approving drugs that only worked for
people who couldn’t take them. And you have to think about
that for a minute, but, ’cause all the people that
dropout get good values imputed, as if they didn’t drop out. So that’s tricky, if you
wanna go the real-world way, and use a treatment policy estimands. I think it makes sense
with pragmatic trials, but you do have to have an outcome. And where you get that outcome you’re gonna have to put some thought into. Now the other analysis people have talked about all day is the per protocol. And there’s a niceness about per protocol, because you would like to find out what really happens if
you stay on the drug. And I had lots of conversations
with Bob about the, over the years about
these real-world trials, and you know, if they, the one thing I think they
can’t do is find out what the pharmacological effects of the drug is. They can find out what the effect of the drug is in a real-world setting. But if you really wanna
know what the drug does. If you haven’t learned
that earlier in your phase then a pragmatic trial is
not gonna get you there, for all the reasons that
Bob and everybody else had mentioned, the effect
gets attenuated for lots of different reasons in
a real-world setting. So we know it can’t do that
but that still doesn’t mean that you might not be
more interested in it if you could estimate the drug
in the real-world setting, but account for the
people who can’t take it, account for the crossovers and so forth. And you can do that. E9 talks about how to do it, not again, in a nonsensical way, you don’t wanna take just
the people who finish, the people who finish the trial. You don’t wanna take, we
would call that completers and compliers or per protocol population, because you don’t know
who would’ve completed had they been assigned the
more hard to take drug. So in the case of David’s
trial with the tablet versus the inhaler, he had a lot
better adherence on the tablet, the people on the
inhaler dropped out more, but would that have adhered
better if they’d had the tablet, now that’s one way to interpret it. The real way to figure that out is with these causal inference methods that observational studies have used and economists, too,
going back in the history. The idea is to try and
equalize, you wanna try and buy back the randomization
of advantage that you lost by having differential dropouts
by predicting adherence or predicting need for rescue,
whatever you need to predict, but that requires covariance
and some of the covariance are pre-randomization and
some are post-randomization. The post-randomization ones are
really tricky statistically. But it can be done. There are sophisticated methods, there are causal inference methods, they have been shown to work
in observational studies, they can work in a pragmatic trial to get you back to an estimands, a treatment effect that
you’re interested in, not just one you can’t estimate, but one that you really
would like to estimate. So here, I’ll conclude with
my philosophical feeling, and this is after my several years at FDA, and going back and forth
in academia and pharma. To me, the biggest problem
here is that these methods, like the causal inference methods, they don’t fit well with the regulatory need for prespecification. And I think this is a common problem. I worked a lot in rare
diseases, my whole career. Rare diseases there’s a big push to use external control data. And you know, sitting inside the agency, listening to people who
are very distrustful, which is what a regulator’s job is, we have to challenge industry to prove to us that they did the right thing. You know, if an external
control exists ahead of time, how do you prove that you
didn’t just pick the worst outcome so that any drug
would look good, right. Well, how do you pre-specify
when the data exists already? There are ways to do it,
but it’s not our usual way. Randomized trials are so easy. Everything’s pre specified… They’re not easy, they’re
very hard to conduct, but statistically, pre-specify,
have your analysis plan, and then don’t touch it
and let the trial run. We at least know that we can prove we haven’t done anything untoward. When the data exists already
that gets harder and harder. So meta-analyses are another example. We worked, Mark, Bob and I
all work on the meta-analysis, Jesse you worked on it,
the meta-analysis guidance, and you’re grappling with
exactly the same thing. The trials have already been run. You know what the outcomes are, so how can you prove that you
didn’t just pick the trials that will give you the answer
you want in the meta-analysis. Well we spent a number of pages trying to explain how you might do
that with prespecification, but it’s hard when the
data already exists. I would argue the same, and actually the quality
stuff is similar, too. Because you have to live
with the data you have, you have to live with the trials you have in a meta-analysis, the follow-up may be
differential in length, but more importantly different in quality. So now you’re in a pragmatic
setting and you wanna use data that either already exists or
exists for another purpose. So you’ve lost control. You don’t, in a prospective
randomized trial, you control everything. You control the duration of a follow-up, you control the quality,
you’ve lost control for that, and you lose some of the
advantages of randomization, even when you randomize
inside a pragmatic trial and you have to convince the regulators that you’re doing the right thing, that you’re not picking things to make the results look good. So I just, I saw that as
sort of a common theme across all three of those problems. And these, it kind of comes home with the causal inference methods because they’re hard, they’re
difficult, but they work. But do they work, can they
work in a regulatory setting? We have to figure out a way
to mimic prespecification so what we can use the
hard methods to buy back what we lose when randomization breaks. So that’s sort of my ending
philosophical statement. Thanks.
– Great, thank you. (audience applauding) Okay, so great presentations, we have 15 minutes left on this panel, if I learned anything from
the previous sessions, I should go to the audience now, because I always end up telling people that we can’t get to them. And watch nobody have
comments in this session. (audience laughs)
So if any questions or comments, we tackled a lot
of issues in this particular session with regard to not just, you know, a lot was on this sort of causal inference methodology but also we think about earlier things that we heard today about the you know, if you’re unblinded or if
you have a lot of measurement error on your outcomes or you
have a lot of heterogeneity in the sort of the clinical practice patterns that are
reflective in real-world. What does that all mean
with getting to reasonable, estimates of treatment effect that we can believe are trustworthy. So that, you know, any comments
on that from our audience, and then Lisa, you talked a lot about, and this did come up earlier with Nancy, and Nancy has talked before
today about things like, as treated versus intent-to-treat and this sort of per protocol, so Lisa, you brought up a
lot of issues with that, that maybe there are comments from me, there are fellow panelists, or others in the room to
sort of weigh in on that. So with that I’ll turn to Bill. – [Bill] I’m Bill Crown, OptumLabs, and appreciate the panel,
it’s terrific and Lisa, your comments I think, really summarized a lot of the issues beautifully. I just wanna kind of reiterate that, I think we do have sort of a
lot of this experience to draw upon from the observational
world and so we get this randomization on the front
end in these studies, but now we’re sort of presented with a
lot of the same design issues as we get non-random attrition
and we get these you know, issues about how to deal with
adherence and where they’re measured over the same
period as the outcomes, and so forth, and the
thing that strikes me is that we’ve got these
different causal frameworks. So there’s you know,
there’s the Judaih Pearl, directed a graph people and
there’s the epidemiologist propensity score and related
methods and the econometricians with instrumental variables
and simultaneous equations and there’s even machine
learning now is you know, targeted maximum likelihood
is beginning to develop causal inference methods in addition
to just prediction methods. And I would sort of
encourage us to think about what is there, sort of a
unifying way to bring these different methods together,
to think about you know, maybe the directed
(mumbles) graph kind of idea as sort of a mathematical
ideal that we would like to be able to get to
but then it breaks down, because there’s two way
causation and things like that, that economists and
psychometrician deal with. And then there are issues that
are being raised about while propensity score methods
may inadvertently introduce colliders that are, you saw
that they introduced issues that you know, people just aren’t you know, thinking carefully enough about. So I think we can learn
from these different causal inference perspectives
and it’s just something I’d make a comment about that
I think we should think about. – [Greg] Thank you, any comments, David? – I can’t speak about the methodology at all because that’s beyond me. But I think one of the
things it points to, and I have this all the time
with my own statisticians is, that it’s great to try to solve things after the end of the study. But it’s an awful lot better if we try, and get the data right at the beginning. And so I think that there
needs to be a very big push, as part, as we start to do
these pragmatic designs, to get as much characterization
done at the beginning. To keep as many patients in as possible. It’s really hard work. We managed to get it down
to two-percent in one of our studies, you know,
and if you can do that, it really makes a big difference. And in fact one of the lessons I think for the real-world stuff
there (mumbles) RCTs, is actually that loss to
follow-up with patients when they’re withdrawn from
drug is one of the things that I think we should be challenging
that way around, as well. So I think we need to get that bit right, and then the methodologies. Yes they’re going to
be incredibly important but I think we need to try and
have as maximum data we can to enable that and to expect the dropouts, to expect the crossovers,
and anticipate them, and really work with it rather than just to try to solve it with our analysis. That would be my personal thought. (Greg speaking over Lisa commenting) – [Lisa] One quick comment
to that if I could, so if you read ICHE9 the gist
of this by defining estimands, which is not how you estimate
but what you estimate, you can also just call
it the treatment effect. The whole gist of that
is to do it upfront, and to plan ahead for how you’re gonna handle these events that might happen so that you’re not doing
it at the end of the study. – Yeah.
– And that alone I think, was one of the biggest contributions, don’t wait ’til it’s
done and then it’s like, oops, I got missing
data, what am I gonna do? – Yeah a lot of methods
were mentioned there and I won’t speak, I won’t address, which might be the better method or not, but I think what Lisa was
getting at in her main comments was what would work
in the regulatory world. And obviously I think the most important thing is something is prespecification. But there are other aspects, too. Whether the assumptions can be tested. Whether you have the data necessary to implement these methods,
like in a pragmatic trial you might be collecting less data, so you may have less information on why people drop out or so. So you know, there are a
lot of methods out there. But I think there are certain attributes, like prespecification,
appropriate sensitivity analyses, appropriate diagnostics
that will be important to make this work in a regulatory setting. – [Greg] Great. Okay, great thanks, next question. – [Jayrum] Jayrum from Pfizer. For Lisa and perhaps Mark, what do you think the value of a registry? I’m thinking again in
the rare disease area, what is the value of a registry, where voluntarily physicians
in different parts of the world are
contributing into a database, then analyzed, what do you think? – [Lisa] Well, so I think
of registries as being extremely valuable as a
source of control patients, or as a way to understand
the trajectory of a disease, and they’re used quite a lot
in the study of rare diseases. They could also have the outcome data. I think people tend to
think of pragmatic trials being done in EMRs and
claims database environments because of the availability
of the outcomes. Whereas I think of the
registry as being more complete on the patient-side
and the recruitment-side, but I don’t know, it depends on how the regularity is for data collection, whether they have the outcomes or not. But, I… And I obviously haven’t
any experience with a pragmatic trial that’s
built off a registry, as much as other databases. But other people might
have something, David? – Not that we’re running, no we’re not running any
pragmatic trials (mumbles), but we are, we’ve actually
got an EMA-approved same study on the back on
of a registry for the safety of one of the biologics and severe asthma. And that, the original plea
was for a special study, which didn’t make any sense,
when it was a perfectly viable large registry running
globally which we were able to amend to turn to produce that data. And I think there are
some real opportunities around doing things like that. The challenge of trying
to run a pragmatic trial in multiple countries but
I guess it does make sense for rare diseases but you
can build the numbers. And so I could see it happening,
and it’s a great idea, but it hasn’t been done,
that’s all I know of. – [Mark] Hi, I’m Mark, I’m with the FDA. A question about these large
pragmatic studies where you’re tracking the use of a drug postmarket, you’re following along in a real-world design looking at the use of the drug. You see the drug, initially
patients start to dose, and they’re using the drug as prescribed, the number of units
that they’re using over time is consistent but then it trails off. And the question is, why are they reducing
the use of that drug? And is face-forward dose-ranging? Are they adjusting the dose because their disease is getting better? It may vary from drug to drug, or is because the drug price is so high that they’re adjusting the dose to tighter their (stammers) the drug use, because they’re paying out-of-pocket, or third-party payers are paying for it. How do you tell the difference between one possibility versus another
in a real-world setting? Are there tools that
you can use to look at, I guess you can figure out
whether the patients are getting worst while they’re
taking the drug or not, or if they’re getting
other drugs on top of that, but are there other ways to
look at why drug use, (stammers) is it really not compliance
or are they trying best to comply but they’re just
adjusting their doses? – Yeah, we’ve heard a
lot of this here tonight. This, it’s multi-factorial,
is the big problem. And there’s the whole patient
beliefs about medicines whether they feel they should
be taking something longterm. If they feel they should
have drug holidays, but there are also other
things that go on is that, and one of the things that,
actually no one criticized me for the Elevate paper and I
thought they should’ve done, was actually patients
self-treat to target. You know, they have a personal
target of what they wanna achieve and they take enough
medicines to reach that. So actually you could argue they took more of the pill to
get to the same effect, as they got to with the inhaler. And actually they then made a choice of leveling off of that amount. Not I don’t know. But I think there’s a
bit of that in there. And so I think you’ve gotta,
to try and explore that in future works, certainly one
of the things to try and look at is getting some attitudinal
work from the patients. Because particularly when we start then on adherence much more effectively. It starts to become possible to say, okay, what was going
on with your thinking? And there are some very good beliefs about medicines questionnaires out there to really capture what people think. So I do that they can be
brought into the picture, but nobody’s truly done
it at this point of time, alongside a study, which we’ve done in real-life settings, but not really alongside a study. – Vince?
– Yeah. Just a couple of comments. One is the answer to all
those bullet points you put out are yes, yes, yes and
there’s a longer list. (audience chuckles)
A couple things that we’ve tried to do in our
studies to pull that out. One is you know, certainly
trying to keep costs equal between groups and
typically what we’ve done by doing that is make it a little bit
less than what they would normally pay out-of-pocket
based on their insurance, but still having some skin
in the game, if you will. So we’ve tried to modify that, but still try to keep in
some real-world elements. Second we do and some of the
studies have some kind of satisfaction questionnaires. We really struggle, we don’t
wanna be too intrusive in there, but in some of the
studies, we have tried to pull those out and some of those
studies are going on right now. And lastly, analytically, we’ve
talked a lot about the fact, if you don’t take a drug
you don’t get the effect, which I completely agree with, but one of the things we try to do. And many, we do tons of
sensitivity analyses, we do try to take a look and see varying levels of compliance, and what that means to outcomes. So great question. – Can I? I wanna jump in on this one, too. So just elaborating a little
bit on what David said, Mark showed some of
the anti-coagulant data from observational
study where there’s this huge drop off on whose staying on drug. Which we also see, one, Riveroxomanus, one of our drugs, it was on the slide there, and my colleagues in
medical affairs tell me, that people don’t feel sick. Even people with AFIB,
they’re not feeling sick, so why do I wanna stay on this medication. It’s expensive, it’s a pain to buy, I gotta go to CVS every month. And even with antibiotics
I think you see that people don’t complete the
full course of antibiotics. Once they start feeling better, they stop taking the drug, and then, you know, five-years later there’s
antibiotic resistance because of that sort of thing. The other comment I want to make is, even in randomized trials, this is just one of my pet peeves, I spent my first few years at J&J, reading a lot of study reports,
and every study report, here’s the list of reasons
for discontinuation, and it’s adverse events, lack of efficacy, and a couple of other ones that are specific and then patient choice. It’s like, what, we saw one this morning, where patient choice was on the line. Okay, the patient wise,
the patient choosing. It’s gotta be one of these reasons. So.
– Great, thanks. Next question.
– Hi. I’m Oftah Shonakuh some Sanofi, we have an expert panel here, and as we develop and bring
new medicines to market, the clinical trial program
is that we are trying to get real-world evidence
earlier and earlier, my question is, when we launch drugs, and as they start to
pick up in the market, the prescriptions are few, how in those cases would you
suggest or advise you know, designing studies to
generate real-world evidence in earlier utilization of
drugs, new launch drugs? – The short answer–
– (mumbles) Jesse. – Yeah the short answer is be patient. (audience laughs)
Yeah. But more seriously, what we
do typically now, is we plan, we go through the exercise of
the sample size calculation and so sometimes we’ll also plan, it’s hard to call them interim analyses, but we’ll kind of look
along the way without doing an analysis, that generates
a treatment effect. But we say, we’re going
to pull the trigger. You know, we write all
the code and we say, we’re gonna pull the
trigger on the analysis, when we get to this many events. So we’ll pre specify exactly when, and we have a detailed
analysis plan that’s, we post our protocols in advance. And we stick to them. And then you know, we get to 603 events and that’s when we run the analysis. So it’s a way of kind of
protecting yourself against picking at the data over
and over again until you get the answer you
want and then publishing. – [Oftah] Thank you, that’s a
fair point because in an RCT you don’t want to do it at too
many interim looks without– – Right.
– You know. But in an EMR data refresh
maybe that’s a possibility– – Right, the temptation is always there. And we spend a lot as an
epigroup, surrounded by people who well, we have a lot
of internal discussions, let’s just say about what
the right way to do this is. And what it always comes
down to is you need to write a protocol, you need to
adhere to the protocol, and we’re pushing more and
more now for posting protocols publicly so that you know,
we’re committed to an analysis strategy, an analysis plan before we ever really look at the associations. You know, we’ll look at the marginal data, so how many exposures do we have, how many events do we have, but we’re not gonna look
at the two-by-two table until it’s, we reach the predefined point. – Thank you.
– Great. Okay, so that wraps up this session. I think it was a great
discussion on causal inferences. So thanks to the great
presentations and panelists. (audience applauding) Now I walk up here. So this next and final
session of the day is a little bit different than what
we’ve been doing so far. The one thing is, the length
of it is entirely dependent on everybody in this
room because it’s just, the open comment session. It is an important aspect
of this particular meeting, and the questions or
comments that you all have, and those of you on the WebCast, this is your opportunity
to provide feedback, provide comments that might
inform future planning and guidance development
and this the opportunity to get your comments on the record. Probably not the best session
to ask a lot of questions, I mean, you know, it’s
only me up here, and so, (audience chuckles)
you may not want my answer, although I’ll try to not answer
or comment on your comments unless I feel really compelled to do so. So with that, I’ll open it up
to the floor for any comments or questions, and I’ll ask our
staff if anybody’s Emailing through our WebCast,
questions to go ahead, and throw those up at
me, too, or comments. And so, we answered every, so
no further thoughts for things on the record, that’s perfectly fine. You know, I didn’t wanna push
you into a territory you don’t wanna go to but I’ll pause
for any comments or feedback. Okay, okay. Well there’s, you know,
Beltway traffic and all of that pending as well.
(audience laughs) So I can understand. So okay, so I’m gonna go
ahead and just wrap things up. Okay, here we go, thank you, Jesse. (audience laughs) Yeah. – [Jesse] Sorry, you can’t get rid of me, Jesse Berlin, Johnson & Johnson. I just you reminded me that
I wanted to respond to Lisa’s question about kind of drawing the senility to prespecification, so I already pounded on the
point about pre specifying a protocol, and to answer Bill
since I run an epidemiology group, we do a lot of propensity scores. And what we’ve actually started
doing which you can do when you have big database is fit
the propensity score first before we ever look at outcomes,
we’ll fit the propensity score and then we can do a lot
of diagnostics based on that to know, do we have a potentially
valid comparison or not. And if not, it’s a little hard
to define the criteria for you know, how do you know
whether it’s good enough or not. But at least in principal, you can say, there’s some threshold
above which or below which we’re not gonna run the
analysis because we don’t believe that the
comparison’s gonna be fair. But it is possible in some situations. It’s a little harder when
you have an ongoing trial to do the matching kind of in real-time. But there might be a way to do it. – Okay, great. Okay, thanks Jesse. So I’m gonna go ahead and
close us up for the day. I do wanna remind, I some
summaries of some of the things we talked about so hold on. I do wanna remind you
though that we’re not done, like, tomorrow is another half-day, unfortunately I won’t
be with you tomorrow, but Mark McClellan will
be here, so, you know. (audience laughs)
That’ll be good, too. So, you know, may not
be as, okay. (chuckles) So we did hear, I mean, this
was a critically important set of discussion topics
that we had today. We did dive into a lot of
details with regard to the considerations and using
randomized designs to generate real-world evidence on effectiveness. I mean, mostly we’re talking
about observational designs and retrospective database analyses, so it was really nice to spend
the entire day focused in on using the element of
randomization when generating real-world evidence. The questions that were
addressed today and the comments were very much related to the
issues that the FDA outlined in their 2018 framework
and the discussion today, and tomorrow will help FDA as
they consider these issues, and work toward guidance. But we did hear a lot
on the considerations on how these studies can be implemented. We had a lot of panelists and
speakers talking directly from their own experiences and
implementing randomized studies that do generate real-world evidence. One of the big areas that we did, I guess the main thing that
did come up in the very beginning is let the
question drive the design. And I think we heard that
throughout the entire length of the day that these things come
up but it depends on what the real question is that
you’re trying to get to. With regard to data
measurement and those issues, you know, first and foremost,
I think it was really important that Sean Tunis
brought up that you know, we shouldn’t, basically what he said was, we shouldn’t just look for the endpoints that are easily measured,
or if we have a data set, let’s just find the
endpoints that we can measure in that data set that we
should really think about what are the most meaningful
endpoints to look at. There are many methods in
bringing stakeholders, patients, regulators, providers, and
payers together to figure out what are those most meaningful
outcomes with regard to the particular disease
that we’re examining and then try to find the best ways to actually measure those outcomes. Sometimes that might, those
ways to measure those outcomes might come from electronic
medical records, or claims data, or registries and we did
hear that you know, with, in comparison to
traditional clinical trials, at least those data elements, they can help us with the
issue of loss to follow-up as you continue to collect
longer term outcomes and have a way to collect
additional data elements for the longer term to
help with loss to follow-up and at least measure
that and deal with that. We also heard that
clearly there are issues with measurement error
and misclassification and we did learn that’s not necessarily a problem if you have mis classifications, as long as you can measure it and it’s balanced across the groups and remember that’s an important aspect. We did turn to blinding. Blinding was a wonderful
discussion, lots of considerations, and many instances where we
can’t blind but we did hear a lot of good ideas and
input on what can be done in those situations and we
did hear about the impact, or the sort of the influence
that the objectivity of the endpoint that you’re measuring
has on that, whether or not there is therapeutic equipoise,
whether or not there’s an expectation of benefit, what
that size of the treatment effect is, and heard some
that perhaps measuring the durability of the effect
when you don’t have blinding could be something worth measuring. I won’t summarize the causal inference, I mean, we just had it, so it’s a friend of
mind of all of you and, but we did hear a lot of
issues that did come up. And I was impressed with
the degree of methodology and analytic approaches
that we have that deals with these issues of
heterogeneity and the population in terms of how the patterns of care are heterogeneity in treatment
effect acknowledge the uncertainty and
importance of acknowledging the uncertainty in the analysis and then I won’t summarize ’cause
I didn’t quite understand, but the as-treated versus intent-to-treat, versus per protocol,
but lots of issues there to maybe follow-up with Lisa on. So thanks for sticking with us today. Thanks for sticking around the
entire duration of the day. We do look forward to see you tomorrow. Have a safe evening and we’ll
see you back here tomorrow. Thank you.
(audience applauding)

Leave a Reply

Your email address will not be published. Required fields are marked *