Shift+6

Danielle@Broad Institute

Episode Summary

Danielle Ciofani leads the Data Sciences Platform team at the Broad Institute. We get into the use of genetics in healthcare, cloud strategies, and racial injustice.

Episode Notes

Today’s guest is Danielle Ciofani. A data architect turned strategist, she leads the Broad Institute’s Data Sciences Platform team maximizing the impact of data science on the biomedical ecosystem. She also cares deeply about the startup community and serves as a mentor to early-stage companies at the intersection of healthcare and data.

Prior to these roles, Danielle designed, built, and grew the largest integrated real-world database of clinical and claims data at Humdecia which was eventually acquired by Optum. 

I really enjoyed this conversation. I learned about how the Broad Institute (pronounced “brode”) enables genomic research (1:11) and how they manage that much data via a hybrid cloud strategy (11:20). I found their reasoning behind their decision to utilize GCP (13:58) fascinating. They’re at a point where building a federated multi-cloud architecture (17:18) spanning AWS and Azure is starting to make sense. We discussed infrastructure considerations for international expansion (22:23) and how personalized medicine can combat institutional biases (24:33) to create more equitable healthcare delivery. 

Danielle shares her take on the state of our healthcare system as it shifts under the weight of the pandemic (28:56) and why diversity in building teams (31:51) is so important. We get into what it means to build open-source software (34:08). Finally, Danielle shares resources to dive into for folks getting into the space (39:52). 

If you’re interested in reaching out to her, find her on linkedin or her website

Danielle, thank you so much for spending your time with us on the show!

Episode Transcription

James: [00:00:35] today's guest is Danielle Giovanni.  She leads the Brode institutes, data science platform team maximizing the impact of data science on  the biomedical ecosystem. She also cares deeply about the startup community and serves as a mentor to early stage companies at the intersection of healthcare and data.

Prior to these roles, Danielle designed, built and grew the largest integrated real-world database of clinical and claims data for Optum  welcome to the show, Danielle. . Well, let's start with just hearing a little bit about, the broad Institute and your team and the, and the work you do there. just to give us a little bit of an intro.

Danielle: [00:01:11] Okay. So the first thing that you should know is that the word is written, bro, a D and a pronounced Brode. So anytime you see it, if you're unfamiliar with the area, you're not based in new England, and often people will call it the broad Institute, but like the thing that your audience. Should now be in the know of is it's actually pronounced.

So, Brode is named after Eli needeth Brode. It's been around for about 15 years and also a fun fact. There are two roads. There's a Brode Institute East, which is focused on science. And then there's a Brode in the West in LA focused on art. and so really the, the first endowment came to the Brode after the original human genome project.

And the concept that the  human genome project kind of illustrated was that the next. Generation of science is going to be big. Yeah. It's going to require collaboration if you don't know what the human genome project was. It was essentially funded by the department of energy, I think from like 1990 up until the early two thousands.

And it was essentially like a multi-year effort focused on sequencing the human genome end to end. And if you're unfamiliar with the human genome, what it is is basically, Hearkening back to high school biology. We all have, 23 chromosomes that when they're uncoiled they translate to a bunch of DNA.

And then that DNA is sequenced and turns into RNA and proteins and actually becomes like all of the functioning aspects of your body. So your DNA is your blueprint and the human genome was essentially building a reference map of kind of like. A common hybrid of a bunch of humans, something that we could use to bump up against  other humans and study, where is variation happening and, and what the variation is kind of pathogenic or causes disease.

So the human genome project was this big effort. There were, I dunno, 40 different institutions, all coding, different aspects of the genome. And that, that effort, you know, it was like 15 years, I think like 450. Million dollars. Like it was an extreme amount of money, extreme amount of time. Don't quote me on that number of money.

I think it was bigger, but either way it was huge. It was a grant undertaking and the Broden stuff Tuk was founded because we said,  Oh shit, like next generation science is going to be huge and we want to be at the center of that and it's going to require collaboration.

Whereas historical academic  work was  insulated in a lab and often kind of competitive who gets the back, the next big finding. So that's the Institute. We're basically a genomics research Institute. We have a sequencing facility. So we basically helped to bring the cost of sequencing down and enable a lot of scientific breakthroughs because you can now sequence a lot of people at a, at a much more, efficient cost.

And we're increasingly doing a lot of crazy stuff in tech and data science, because we're at the intersection of bleeding edge, large scale data processing. And that's where I fit in. 

James: [00:04:08] Yeah, very cool. And could you tell us a little bit about sort of the general size of the organization and of, of your team? just to give us kind of a scope of kind of who you work with on a normal day.

Danielle: [00:04:19] Yeah. Yeah. So the broad Institute about five years ago, basically, as, as we improved our data processing efforts, different parts of the broad Institute and they're scientific labs. So they run kind of like an academic university would where you have a principal investigator and they're working on some scientific element.

They would hire software engineers or teach their lab members to build software, to help them. You know, everything go faster. And then the sequencing facility also had software engineers building, software to make their stuff go faster. And basically five years ago, the organization, the Brode as a whole looked at this and said, if we're not careful, people are going to be reinventing the wheel and doing things slightly differently.

And it's going to be really  and efficient. And I feel like that's a common story that happens in many organizations. And so they piloted moving a lot of software engineering and data science under one roof. And that's how the data sciences out forum was started. About five years ago. We're now, with a lot of funding from tech, philanthropists, government grants, and even commercial partners, we've scaled very rapidly to about 200 people.

And our goal is really to bring together. All aspects of the kind of biomedical research life cycle. And so that includes partnering with patients to make, patient powered. research go faster. So historically I think we all know that patients don't often own their data. the software systems that capture their data are often the people who are kind of negotiating and transacting who gets access to that data.

So we build software to help patients kind of be more at the center of how research is  using their data. And then we have a large scale kind of data processing facility, and that's kind of like ETL and large and complex pipelines and new scientific algorithms, optimized for scale and throughput.

And then we have, a platform that is near and dear to my heart called Tara, which is basically, an open source kind of federated infrastructure. For large scale, biomedical and genomics,  research data assets. and so there's a lot of, I feel like players out in the space where they're building , privatized federated, data networks.

And we're trying to build an open source version of that that is free to use, but also has a lot of the kind of security bells and whistles that protect, you know, the world's most sensitive research data, which is genomics and clinical data. So that's what we do.

James: [00:06:41] That's very cool.  Yeah. And I'm sure we'll dig into a little bit more of the details of the platform here in here in a bit, but, yeah. And just to get to know you a little bit more too, how did you find your way to the Brode? kind of what got you interested in healthcare and data science in general and yeah.

Would just love to learn a little more about your, your background and the path you took.

Danielle: [00:07:01] Sure. Yeah. So I am a biomedical engineer by training. I graduated from college in Ohio where I'm from the West represent, Back in the like, you know, years ago, financial right. Of 2008, I was sort of just like kicking around with a biomedical engineering degree and not really working, doing yoga, working at a bar.

And I finally got a call from, from Accenture in their it consulting shop. And they offered me a position in Boston and at the time it wasn't doing much. So I took the call and never really looked back. And since then I've made a career focused on healthcare. I've always, healthcare has always been very close to me.

My mother was a physician. There was a time where I really thought I wanted to be a clinician. And then I realized being a consultant and studying for the MCATs that like my life, wasn't going to get any more fun on a, on a path to medical school. So I nixed that idea. And said, well, what's healthcare adjacent.

What does that look like? How do I scale kind of impacting people, without having to do it hands on. And that's where kind of health it really, really showed up for me. And so from Accenture, I did a two year stint in consulting, which really taught me a lot about like onboarding and ramping up and. and adding value very quickly.

And, and I jumped from there to a startup called Humedica, which, was one of the early ones players in what was called population health at the time. And population health was essentially aggregating all of the disparate, electronic medical records that were all in these different kind of mom and pop.

What chronic medical record systems kind of aggregating them, building a common data model and then building business analysts on top to, to surface back to providers. Things like are pretty simple at the time, right? Things like here's a list of people who had a heart attack in the past year. One of the biggest risks of having a heart attack is that you've had a heart attack before.

Right. So here's that list. Here's the report. Go ahead and call them. And so through that kind of work, I got really familiar with different. EMR backends and data modeling and the importance of scaling your data infrastructure. But even that, that business. Didn't always sit well with me because we were kind of stewards of data, on behalf of the institutions and patients like often didn't know about it and weren't consented and, and we were often kind of transacting around the data as if.

It was our own when the implied truth is it's kinda not. And I feel like the world is slowly waking up to that, but what caused me to ultimately leave that company is we were acquired by a large healthcare insurance company, Optum part of United health group. And the business was really strong and the, and the data and the opportunities to help were great, but it was all kind of in the context of current healthcare in America, which we all know is like pretty frustrating, to put it politely.

And so I moved into a partnership role, kind of focused on business development, sat across the table from the Brode Institute in one of our partnership conversations and realized that, that the mission and the vision of the Brode, which is to do things at a big scale kind of impact globally, give back, empower members of the community, not acting on their behalf.

all of those values really sat well with me and. And they, you know, the data sciences platform we're just getting started. So about three years ago, I kind of left the startup, which was mature by then, and then kind of joined my next quote, unquote startup, which, you know, got the opportunity to pioneer and grow the org and do some different things inside of it.

And, yeah, I really, I truly believe in like following people. People and kind of vision are two really important things. And, I feel very grateful everywhere. I've been, I've had the fortune of connecting with very brilliant and also empathetic and mission-driven people. But I, I truly believe that right now in my, in my current home.

So I'm very lucky.

James: [00:11:03] Yeah, that's amazing. So maybe you could, tell us a little bit about the, you know, the kinds of challenges that you were in the team or are working on right now and, you know, feel free to, Go as much into detail as you want on the, on the tactical well details. as much as you're comfortable sharing and, and, yeah, we'd love to hear us about some of the things that you all are working on.

Danielle: [00:11:20] So one of the challenges that predates me, but I think is interesting is, you know, The broad Institute is a large, large genome centers. What we call ourselves, basically we sequence DNA as a service for members of the research community for other, organizations, for pharma and, , in the early infancy of my group, the burden, the Institute ran out of on-prem storage.

And just to kind of give you a, a frame of reference, You know, DNA sequencing, basically there's a lab component, , you're pipetting and you're doing wet lab stuff. And then there's a box that is the sequencer. And, long story short, what, you know, what goes in the box? Our biological samples, what comes out of the box are basically little paper shreds, not literally, but like imagine a novel that is your DNA.

And it's kind of shredded into little sections and. Each of those sections are kind of like, it's basically like very messy data that then needs to be reconstructed back into the novel so that we understand kind of your DNA end to end. And so what goes then the machine is biological samples. What comes out is very messy data.

And so the order of magnitude of that data is like for every whole human genome that is like end to end all 23 chromosomes. what comes off the sequencer is about a hundred to 200 gigabytes per genome. And the data processing that needs to happen kind of reduces that only into like the, the thing that are biologically unique to you, that we call them the variants.

And that is in the order of, you know, A hundred to a couple hundred megabytes. So that's kind of gives you the sense of the data processing that needs to happen. And it's very spiky, cause it depends on the load of data that's coming through the sequencer, but also you need to keep the original, messy data, which is essentially like 200 gigs per.

Human sample. And so in the world on-prem data storage, it was very easy to spend a ton in anticipation of what your, , system loads were going to be. And then also just like basically max out very quickly, especially if there was a scientist who also needed to use that reserve compute for data processing.

and so yeah, we, we ground to a halt a couple of times, and it was a very big deal because if you can't do sequencing and then science stalls, and it's very visible and yeah, innovation can't happen. And so about five years ago, that kind of began our first foray into moving data to the cloud. Long story short, we have almost a hybrid infrastructure, mostly cloud, like the data is delivered to the cloud.

Most of the processing happens on the cloud, but the sequence is still on prem and there's so a little bit of stuff that needs to happen on prem. But the big first challenge for the Brode Institute was figuring out like which cloud vendor we wanted to work with. And at the, at the time we made the decision to work with Google cloud platform, which was like no shade to GCP, but like kind of an unpopular move.

Right. we're a large scale organization. I think AWS is obviously the undisputed champion in cloud computing and storage. But for us, the Brode, as I mentioned, kind of our core values are collaboration. Like that is key to what we do. And, and with AWS there wasn't really room for a partnership model and that's no shade to them.

If you're in first place, you don't really have to partner. It's just sorta like, you know, I'm your vendor pay me. So, one of the strategic kind of challenges that we were facing was. Who is the cloud vendor that we can work with collaboratively. And, and remember, we are a large team of developers.

Looking for developers to sort of join our cause and work with us on these problems. And so GCP proved themselves a very, wonderful partner in that way that we worked together on building, what is now known as the I plan CPI at the time, I was thinking genomics API, which is basically a pipe. Well, it was an API optimized for genomic workflows and we still use it today and our platform for a lot of data management and data processing that we make available to the world.

Like we make all of our production pipelines available to the broader community. So anyone can process data consistently with the way that we do it. all of that leverages the underlying infrastructure of GCB. So in many ways, they've helped us get off the ground and we've helped bring them into the genomics research ecosystem in a way that I think, the other cloud vendors are now kind of looking at and want to join.

So it, it tees up the next level of challenges, and this is kind of where we are in the mid. So we've built a data management platform that basically, the way to think about it. I think of it as like, kind of almost an octopus where there's one front door, that researchers can kind of enter into and it's, you know, web application with a data library and.

Behind the scenes, each one of those data sources in the data library is managed by a different entity, and potentially on a different cloud. Right? So it's the octopus, tentacles are basically that the data doesn't need to be physically aggregated in one place in order for researchers to access it.

And there's a lot of bells and whistles that go in between that, but that's essentially the working model that we're moving to today. All of those data. All of the data in our data library are on different kind of since of GCP, managed by different people. So we work closely with university of California, Santa Cruz, work with U Chicago.

we work with Johns Hopkins university and so people are managing data in different places, all on Google cloud. Now the reality of the research ecosystem, as I mentioned, AWS, Azure data lives there, like a lot of data lives there. And so you can't really build a research platform that serves the world.

If you are ignoring the two larger clouds in the ecosystem and where a bulk of the genomics and medical data lifts, especially if you want to include kind of clinical data from health systems, which is almost entirely, you know, Azure based. So the thing that we're dealing with right now is. How do we bring the other two cloud vendors into the ecosystem and build a platform that is truly multi-cloud in a way that incentivizes everyone's participation and collaboration among what would otherwise be competitors and it's tricky.

and I think there are. Open questions about how we leverage, for example, GCP has released. and those are big query Omni, which are essentially, compute mechanisms that are kind of just CP based, but can be deployed on other clouds. And to what extent do we leverage that, which is also a little wind with GCP, recognizing that multicloud is kind of the future.

versus where we kind of federate across multiple clouds. And for us, that's kind of, that is a, that is a challenge infrastructurally. I know there are other kind of software shops and platforms that have fixed it and managed it. But I think what makes it a little bit more complicated for us is that we have a commitment to making things as open as possible while also kind of managing the.

Restrictions of data use and data access that are kind of pushed on us by the fact that it is controlled access, research data we're talking about. So a lot of trade offs there, I'd be curious. I mean, James, can you tell me about kind of the extent to which redox is working across clouds and whether that's been challenging for you?

James: [00:18:59] Yeah. Yeah, for sure. So, we, we, we decided to go with AWS from more or less from the beginning. we actually started on a. Kind of a managed service on top of AWS called the bull. and, Frank, the CTO of Aptamil is actually going to be a future guest of the, of the podcast. So super excited about that.

But, they, they basically wrap up all of the HIPAA compliance components, for,  companies who are looking for a managed service to do that. And it was really critical for our, for our start to be able to leverage their,  experience there. And, eventually we moved  directly AWS, not because of any sort of dissatisfaction with them, but more that we were, we were doing a lot of stuff at the networking layer and, it was just, we were just not a, most of their companies were really patient or provider facing.

And so we were, We were, we really need to get closer to that, that one at that networking layer and move away from that managed service. But yeah, so we've been on AWS for more or less since the beginning of our company for about six years. And, we do use, we do actually have some stuff running in GCP, which is actually our, our AWS testing harness.

and so, we, if you are testing your infrastructure, you don't want to test it from within the infrastructure that that could go down. so we actually have, We used UCP kind of exclusively for that. And, yeah, so we, we, we don't really do any kind of, kind of cross cloud or multicloud, posting right now other than some of those additional services that we use to, to, to help maintain our or AWS environment.

But it's definitely something that we've, we've looked into. And, it's also something where, the. It's a really tough decision between the pace at which things are changing versus kind of the, the switching cost. I mean, moving, moving or adding in an additional hosting provider is, is a, is a pretty significant, level up in terms of just the maturity of, data sinking.

And, for us, we do a ton of near real time work. So, there's. You know, if, if we were, if we were a little bit more batch processing, I would say it would probably be quite a bit easier, but, yeah. Having multiple environments doing, doing that is, is a real challenge. So,

Danielle: [00:21:07] Oh, that makes a ton of sense. Yeah. I think of the world, this sort of like, this is kind of my own data management philosophy, but I feel like there's like, Fast data and slow and perfect data. And I think research is truly slow and perfect for the most part. It needs to be curated and more is always better, but you're willing to wait for that large perfect dataset.

And I think that's, that's the world we're operating in and I think it is different from redox in that way. You really are focused on real time data. And even if it's like a little messy, people are willing to tolerate that because it's like, I got to know the time of the appointment or whatever. 

James: [00:21:39] the other thing, the only thing we're doing is, is, leveraging, as much infrastructure as code, as possible. So we use something called called Terraform, which, in some ways you kind of trade off the portability and repeatability of, of what you're doing for, You know, you may not always have access to all the specific components of each cloud host because they kind of find the, you know, the least common, multiple between all of them in a lot of ways.

But, for us, it really covers a lot of the use cases and, I think that that's going to be a unnecessary tool for us to extend either out to additional regions of AWS or,  have a, have a multi-cloud. I think that's, that's a really pretty critical to do it, with, with a relatively small team.

Danielle: [00:22:23] I have a  follow up question.  I think we're both probably in a similar place where it's,  , thinking about us only is, is sort of a less complex problem, but there's, you know, ultimately you're not successful until you're kind of deployed internationally.

When you talk about regions, are you more thinking about U S regions or are you thinking international as well?

James: [00:22:43] yeah, I was actually thinking about both. Yeah, that's an astute question. Yeah, I think, yeah, the biggest challenge for, for international for us has been, there's scenarios where, where we would actually have to have an on and in, in country, support team or something like that. The, the technical lift is oftentimes pretty feasible and pretty understandable.

The. Operational and, and kind of logistics, lift of providing, you know, support from Germany or from Canada or something like that is, would often mean, you know, setting up a new office, hiring more people, things like that. And that's really where the, the, the friction has really hit for us. in terms of thinking about our expansion,

Danielle: [00:23:28] Oh, that's interesting. That's interesting.

James: [00:23:30] there, there are also some really interesting, Data specific requirements around, what is allowed to be transacted and what is not.

And sometimes those vary by country. So there's, there's also a bit of a, kind of a policy or illegal, research component for us as we're, we're exploring different countries. And I feel like I may not have full, confidence in what I'm about to say, but I believe in, for example, in Canada, I believe like communicating the patient's race is, in some way forbidden.

And so, typically what they do do, you know, even like lab results could vary by, by the patient's race. So they'll just communicate in the lab result in like the text, all the possible reference ranges for any possible. Race, and then that's up to the physician to interpret with the patient kind of directly.

And, yeah, it's part of their, I believe it's part of their, their kind of legal system. And so variances like that may come up in every specific country for us. So there's, there's a little bit of homework for us to do. Yeah. In front of, talking about new countries as well.

Danielle: [00:24:33] , on a separate subject. I think that is really interesting races, you know, with, you know, with black lives matter and, and the position of America kind of waking up to systemic injustice. it's very prevalent in the healthcare system and I think genomics plays an interesting role here.

Often. I think of genomics as. Sometimes like a little bit of Naval gazing, right? If we, if someone's got type two diabetes, which is a reversible disease, we know how to treat that with diet and exercise. Yet we failed to treat that. And so the more Calla side of me kind of looks at genomics and says like, what more do you need to know?

We already have information that we need to treat people on the other side. I think genomics plays a really. Pivotal role in shaping the next generation of biometrics, because I think to that exact point often. The biometrics we use today have correction factors based on race and race is a perceptive thing for the most part, right?

If someone passes as white or passes as black, you can often potentially screw up your calculation and those calculations can lead to undertreatment or under risk scoring someone who actually liked, deserves and needs. Better health care treatment. So I didn't know, Canada had sort of nixed that concept already and sort of said like race isn't appropriate.

They haven't figured out kind of the second piece, which is how do you interpret lab values correctly? Because they can vary by people's heritage. But I do think once we kind of get to a more advanced place in genetics, I think that that is, that will plug the gap. That race is kind of like. Kind of screwing up.

It's like a good proxy, but not good enough. Yeah. And, and it's propagating health inequities. So one example that today is a concept called a genetic risk score, which is essentially an algorithm that you can run on your 23andme data today. And what they are is they're not even advanced data science.

They're just basically like linear regressions of your entire genome Correlated with , symptoms or what we call phenotypes kind of the expression, the biological expression of a disease. So for example, we'll study hundreds of thousands of people. and look at the core, the sort of linear regression of their genome among people who have, heart attacks and we can build apologetic risk score.

For a heart attack or for schizophrenia or for diabetes. And essentially you have this reproducible algorithm that you can kind of authorized to run your 23 and me data on it. And 23 me is like important because it's much cheaper. You're basically right. Only 23 and me and ancestry.com only sequence like certain parts of your genome, not the whole thing.

And that's cheaper. So the point is like, you've got kind of inexpensive. Genetic data, you can run this algorithm on it and it will tell you like, Oh, you're actually in the top 10% for risk of heart attack. And those are the people that you really want to kind of target for diet and exercise and stress reduction more than just kind of the general society today where it's like, Oh, your LDL or your age.

And those are kind of the only two predictors we have. And so it's so the recommendations and treatments aren't. Often, targeted specifically enough. So I get really excited about that. That's a tangent related to kind of race and data modeling, but, but I am excited about it.

James: [00:28:00] Yeah,  I kind of share your excitement that, you know, a lot of demographics have been used as, as proxies or abstractions for what it really should just be. Personalized medicine. And there are areas where, You know, there are things that are not as visible, that, that are equally, very into cross across, that, that I'm, I'm super excited about.

 there are, set of genomic tests as well for, medication efficacy. so that I've learned about a few companies that are doing, doing testing specifically to find efficacy for mental health medications, for example, because there are certain markers that indicate that certain types of drugs are just ineffective or have, you know, the side effects are worse than the, the actual benefits and things like that.

So, yeah, I'm, I'm super excited for that space to become more, A more commercialized as well, or more consumer driven so that, you know, patients can really be in charge of their, their own information and be able to share that with their, with their providers and, and get the right care they need. 

. I'm kinda curious your take. So this is one of the things that I've been kind of keeping an eye on too, and you may have a interesting perspective on it too, is that, been thinking about a lot of the. Kind of economic changes in healthcare due to due to COVID and the pandemic and, Some of the scenarios that we're seeing now or driving a lot more, consumer focused approaches.

So there's tele-health now, which is kind of leading people to question, why should I even go to the clinic whenever I can, just stay at home and, and be on a video conference. And then that raises the question of why does my doctor even need to be in the same city as I am? Can I just talk to anybody?

and people are getting. Tests mailed to their, their houses. And, is there an expansion of diagnostics and, as, as sort of a retail retail space as well. So I'm kinda curious from, from your side, are you, are you seeing any of that or how is, any of the changes right now, impacting you?

Danielle: [00:29:50] Yeah. So I'm definitely bullish on any organization, startup enterprise, whatever that is about scaling healthcare resources. I think redox, obviously you fall squarely in that category, but, in general, That was always an important aspect. And now more so than ever, I think it's challenging. It's challenging the assumptions about what daily practice needs to be.

Right. Both in healthcare and in work in many other ways. So I completely agree with you. I don't really have any novel insights to that. but I completely, we agree. I think the thing that I've been spending a lot of time on has been the economic. aspects of paying for health care and having that be employer based.

And that's the thing I think with, record unemployment and record utilization of healthcare. I think there's going to be a very large medical loss ratio from America's hospitals that we haven't yet discussed or dealt with. And  the disconnect I see. Is in times of a pandemic, we are still a hundred percent talking about the economy.

And the issue I see with that is the fact that we need the economy to run so that employers can provide jobs and through jobs comes medical insurance. And, and I think, I hope that this kind of proves a model whereby maybe there's another option for people to have insurance. Because those medical bills are lagging.

They come at a later date. And it's the thing that I don't hear enough people talking about, which is like how the model's kind of failing us right now. Does that make sense?

James: [00:31:39] Yeah, 100%. %. Yeah, I think we're, we're one of the few countries where we're, it works this way. And, we're also one of the least efficient countries in terms of delivering, delivering healthcare. So, you know, there's, there's some correlation there probably.

 So maybe moving back a little bit to some of the data science work that goes on at the Brode and on your team. you know, could you tell us a little bit about. How much data science and healthcare is sort of about math and, and knowing the models and the algorithms versus knowing the actual content of how healthcare works and kind of the medical specific components of it and what you look for on members of your team.

Danielle: [00:32:16] you know, the Brode focuses on collaboration. I'll say that again. And I think a big foundation to collaboration is intersectionality. Like you need a lot of different. Types of expertise in order to make this work well. And I think we're not unique, to other organizations in the healthcare or life sciences field, where when you need a data science, a scientist, or you need a software engineer, you don't necessarily need them to have healthcare expertise.

As long as there are people with healthcare expertise at the table. And so our organization's not that different in that way. We have, we have some amazing machine learning scientists. We just hired a group of people who were kind of working at Uber before this. and I wish I knew the specific algorithms and tools that they had been responsible before, but they were good ones, right.

We hire from all different aspects of the data science community, and really focus on anything ranging from  optimizing a pipeline for throughput so that you can sequence data faster and do quality checks on data faster to like machine learning algorithms that , for example, can co-train and, cardiac MRI data with an ECG so that you can detect  ventricular hypertrophy, not from imaging, but from your ECG signals that are undetectable.

So the I, so we've got teams that kind of can do all of that, but they need direction from. academic and clinical researchers. And so we fail the moment, you know, there's no scientific champion at the table and clinical reviewer reviewing the labels that we've created. And so I think we really thrive because we've got all of those seats at the table, but I think that most healthcare companies that are successful in data science, Have that kind of makeup as well.

So I'm very, very, encouraging of intersectional teams and, and that's how we do it. Both in data science and software and with our specific scientists too.

James: [00:34:08] And one other question I was going to ask about sort of your, your model and kind of the open source approach and the, the collaboration with the outside world is, just a little bit about how, how that works. you know, how people get engaged, if you have any. Cool success stories or anything like that?

Danielle: [00:34:25] Like open source is a pledge we make to the community. I think a lot of people who write open source software means like open source doesn't necessarily mean reusable and, implementable. Right.  in early days, we built components and one example is we have built a workflow execution service, and a specific, like, easy to use.

Workflow execution language for researchers. We built those two. We opensource them and there's good adoption in the community. the, you know, the cloud, software engineer from cloud vendors like AWS, we've got an Ali Baba cloud backend. We have an Azure backend they've kind of built back ends for our own workflow execution service so that, users in the community can deploy it.

It works on prem too. On HPC and et cetera. So that's been a really great example of community adoption of something that we've built. And it promotes standards of data processing, which ultimately promote reproducibility of scientific results, which is obviously a challenging thing when you're kind of emerging in a new field.

now we're kind of at phase two where. What we're operating for. The ecosystem is an amalgamation of a lot of different components. You know, we're operating a service more than we are operating for components or pushing code out into the ecosystem for community adoption. And so in that world, you know, everything is still open source.

 I could say anyone can stand up our platform, but I don't think anyone would well, cause I think it's costly a lot of money and a lot of headache just maintaining it, trust me. And, And then it's sort of like, let us do that for you. At the same time, we've open sourced, the kind of data management layer.

So anyone can kind of stand up a new node of the network next to the mothership so that researchers can access they're private data maintained in their own node, behind their organizational boundary and connect to the, more publicly available kind of community research data that's available through the mothership and other nodes.

So that that's kind of the model that we have today. I will say, you know, I believe that I believe that software doesn't necessarily need to be proprietary. And I, and I would argue there's probably not a huge difference between open source and closed source software besides optics, unless you're building something very small and nimble, or it's like a, an algorithm or a code package.

So like, I like that we're open source. I think it's a nice thing that we can Pat ourselves on the back. For, but as a huge differentiator are people using our software because it's open source. Now I think people use software because it's easy to use and it meets the username. Where I do see the community going is kind of in a similar way with algorithms too.

So, for example, we are, you know, we're building a lot of machine learning algorithms. We're building them for purpose, for different, collaborators that we're working with. And. You kinda need a common engine to spit some of this stuff out, kind of work on labeling and, and be built across multiple datasets as, as new data comes in in general, you know, in, in the biomedical research world, the more biomedical data you have, the better, and there's a few kind of large scale data assets that are.

Very useful. The UK biobank is one the U S building a dataset called the all of us precision medicine program. And that, I think it's now the all of us research program, but it's gonna happen. A million Americans available for research. 300,000 people have consented and have their EMR data in the system today.

They're all going to have whole genome sequencing. So pretty soon this day, it is going to be like a very common reference for population genetics. And anyway, the point is you're going to want to build kind of a reusable ML engine on top of that, that can spit out different algorithms and, and relabel and kind of iterate as needed.

And there's an open question for us in terms of which components of that should be open source versus not right. Cause at some point in time there's gotta be something that's protected, something that's kind of special to an organization. And I think I'm increasingly of the mindset that. The kind of processing and the engine should be open source.

It should be available to the community, but what gets spit out like the sort of most downstream deliverable, the outcome of the, of the. Engine should probably be the IP and the situation. I just think there's this constant shift of raw materials start out as IP. Then they kind of get commoditized and then they should be open source and available to the community.

And so I think that's happening in data store and data science. And I think the kind of, built for purpose algorithms will probably move to a world of being more IP and the raw exhaust and kind of the input should be available to the community broadly. But it's a

James: [00:39:08] Yeah, I would, I would say that that's, that's actually very similar to the world that, that redox exists in as well, where the, you know, the world 10 years ago was all of the monetization and services were around getting the data from point a to point B. And in our, our goal at least is to totally commoditize that part and, you know, continue to focus on sort of, you know, the top of the treadmill being, what are the new services that are lowering the friction to get into healthcare, lowering the friction for a developer in the healthcare space.

And, as we get adoption, continuing to, to kind of advertise that, that work across the, the entire developer community. So, Yeah, I think it's a, it's a trend we see too.

Danielle: [00:39:50] I'm glad we're aligned on that.

James: [00:39:52] maybe this is a good point to segue into, you know, if you, if you have any. Recommended resources or, or, anything for folks who might be getting started either, with a technical background, just coming into healthcare or, or just starting out their careers , anything you would suggest in terms of resources for folks?

Danielle: [00:40:09] the one other aspect that I would encourage people to read up on is. The fundamental kind of economic factors of healthcare. So it's one thing to understand the businesses. It's also important to understand the foundations and for that, I would read anything by Michael Porter. I'm a great, healthcare economists, often talking about just sort of effective ways to stratify populations and deliver care effectively at a high cost, high quality, low cost.

And then the one other thing that I think about a lot is these. Kind of how economic factors drive healthcare outcomes. That is how healthy people are. And for that, I would really recommend either like getting a summary or reading the help gap, which just, especially in this day and time when there's unemployment is well skyrocketing and there's a lot of civil unrest, I think understanding how much social factors like.

Poverty and how well school systems are invested into really have profound impacts on health outcomes. And I think it'll give you a better sense of sort of the factors at play and where we're businesses and technology can really enable and change healthcare for the better, including delivering care to more people in a scalable way.

So if you're, if you're into sort of. Reading as your, as your method of learning. I think those are great resources  maybe be just learning on the job. I mean, James, I feel like you and I have both kind of learned everything that we know from, from our work in it.

Right.

James: [00:41:38] Yeah, absolutely. I was, I was actually gonna mention this before, when you were saying how you, how you got started. our stories aren't that different in some ways I was a. I graduated college, I think right. About the same time , it was actually kind of just playing poker semi-professionally when a for, yeah, for about a year or so before.

Epic actually reached out and started there and yeah, going into it, I. Yeah, I was a physics major. I did a bunch of computational physics, knew it was programming, but knew next to nothing about healthcare. I really learned on the job there as well. And even as we're hiring it at redox, we're very much looking for a diversity of backgrounds.

And, you know, our, our goal is to. change some of the status quo in healthcare. And, we're very intentional about, you know, not always bringing in folks with healthcare background because, we, we need to learn from other disciplines and other, other perspectives. So, yeah, absolutely learning on the job is 100% the best way to go.

Danielle: [00:42:36] Yeah, I love that. I love hiring for differences also. I completely agree. 

 I think hiring for differences is like the number one, most important thing we, and other organizations like us can do right now.

James: [00:42:46] Great. Well with that, thank you so much for joining us, Danielle. and yeah, I will be talking to you soon. And, just as we close, if folks are interested in reaching out to you, what's the best way that they could get in touch.

Danielle: [00:42:59] They can hit me up on LinkedIn. I have a website too, Danny geo.com and you can kind of learn more about me and my philosophy, but LinkedIn is probably the easiest one. 

James: [00:43:07] that sounds great. Yeah. And we'll include links to everything that Danielle has mentioned, including, how to get in touch with her, in the show notes. But thanks again for joining us. Daniel

Danielle: [00:43:16] Thanks bud. This is great. Nice talking to you.

James: [00:43:18] There, you have it. Our second installment of the shift six podcast. Thank you so much to Danielle for being on the show. Join us next time, where I'll be talking with Greg Tracy, fellow Madisonian and CTO of propeller health, and remember to subscribe to shift six. So you don't miss out. And quick favor, we're a new show.

So leaving us a review and rating is super helpful as we get our podcasts legs under us. And as always, please send feedback or guest ideas to podcast@reduxengine.com. We'd love to hear from you. And finally, thank you for listening to shift six, a podcast for healthcare developers.