For localized information and support, would you like to switch to your country-specific website for {0}?
- Events
- American Society of Human Genetics (ASHG)
Roche at ASHG 2025: Key highlights and resources
Advances in sequencing by expansion (SBX). Multiomics, methylation mapping, oncology research and building a sustainable framework for ultra-rapid genome sequencing.
*This transcript was generated using an AI-based transcription tool and may contain errors. It reflects the spoken content of a live webinar and has been lightly edited for clarity.
Mitu Chaudhary: My name is Mitu Chaudhary. I will be your moderator, and I will also be introducing our key speakers. I'm a Senior Director, International Business Leader at Roche Sequencing Systems, and my team is responsible for managing our sequencing systems products, which include the AXELIOS platform that we'll be talking about today, as well as our automation platforms.
But essentially very, very quickly, we do have a rich portfolio of sequencing technologies, primarily in the sample prep space, with our KAPA library prep and target enrichment, and also with our Navify Insight portfolio. Then, over a period of time, we have been developing products on other sequencing technologies, really building these tools together into effective solutions. For example, our AVENIO assay kits, like our comprehensive genomic profiling kit.
But going forward, this is where our platform comes in. We have talked about AXELIOS platform and the product that we will be launching next year will be called AXELIOS 1. This will be the name of the platform that we will be launching next year. And what this really indicates is our commitment to continue to drive innovation on SBX technology, it’s already showing a lot of potential. We are actually going to talk a lot more about the new applications that we have been demonstrating on the backbone of SBX, but this is really the first in its class.
I'm also very happy to announce the pricing of the AXELIOS system. Never heard that question from anybody, I know you're not wondering at all! But I thought I'd just tell you anyway. So, in the United States, the platform will be priced at $USD 750,000. We believe that this pricing will enable adoption across a wide variety of labs, for a variety of applications.
The second question nobody ever asks me, what about operational cost? That's where you will still have to wait just a little bit longer. However, what I can say is this; the platform performs sequencing runs in a much shorter time, and so what we are envisioning here is a price per gigabase that is very, very competitive to the high throughput systems, at market standard accuracy.
Now, we did talk a little bit about duplex and simplex, and we'll actually showcase a lot more on this in the upcoming presentation. So, what you will see there, especially with our simplex approaches, there we can enable a wider range of use cases and leveraging that very, very high throughput we actually see that that pricing will enable you to do deeper and broader studies at a scale that has not been possible until now.
With this system in place, AXELIOS 1, and our analytical portfolio with XOOS analysis platform, what we envision is really across each of these verticals in the sequencing workflow, to have these individual tools and technologies that are very, very capable. But then also what you will start to see from us are, at the bottom as I say, Roche assay kits, which is these solutions for future, for example. So in the germline space, as well as in the oncology space with things like MRD, genomic profiling and cancer detection.
With that, it is my pleasure to actually introduce the key speakers for today. I'll first introduce Mark and then Katie Larkin.
Mark has nearly 30 years of experience in biotechnology, co-inventing the SBX technology in 2007. He formed Stratos Genomics that same year with co-founders Allan Stephan and Robert McRuer. First, he served as the company's President and Chief Science Officer and then as a Chief Executive Officer. He led Stratos through its acquisition by Roche in 2020. Before Stratos, Mark founded BioCaptus in 2003, it was a biotechnology consulting firm. He also served as the Director of Technology for QIAGEN Genomics and was a founding member of Rapigene. He holds a BS in Biochemistry from the University of California, Davis and is an inventor on over 20 US patents.
I'll introduce Katie Larkin as well. Katie is the Director of Clinical Product Development and Strategy at the Broad Clinical Labs, where she helps drive product innovation and shapes the strategic direction to advance BCLs mission. She has been at the institute since 2009 with experience spanning lab operations, clinical sequencing, and next-generation sequencing technologies. Mark, over to you.
Mark Kokoris: So, the problem with me with a presentation like this is, where do you stop? Because for everybody who knows me, I kind of want to relentlessly push in every direction all the time. So, for my team in Seattle, I just want to say thank you for putting up with me, I think, getting ready for this presentation.
But it is all that. So many years of pushing on the technology. I just think about seeing the things that we can do, which is very exciting. And so, what we came up with here today is several new applications: talking about the longer RNA sequencing. We're introducing this bio directional longer RNA sequencing project that we're doing with Sanger and the melioidosis sample set. Also extending some work with Aziz al’Khafaji's team at Broad. We're actually going to introduce some multi-omics work here as well, including some methylation data that we generated, and then show that with some tumor normal samples. Introduce target enrichment. I get a lot of questions about that, and so we just kind of you know show a little snapshot of some data there. Then finish off with some vignettes for whole genome sequencing, FFPE, MRD and SBX-Fast.
And then you'll notice the little symbol on there, I don't know if this has gotten out there yet, but we've actually broken the Guinness World Record for the fastest DNA sequencing technique. I'll show the time at the end of the presentation and Katie is going to spend her part talking about why that's important to be able to do. And so again, I just really want to thank my team in Seattle again for all the hard work. I want to thank the CSI team, and the related teams in Roche, for pushing so hard to put all this material together. I think it's quite a bit of stuff to cover here and it's just an amazing team that I have a lot of respect and admiration for. OK. And if it wasn't clear before, yeah, I got the world record, so this is a little bit bigger here!
OK, technology overview. We've had two webinars now, several presentations and that seminal pre-print. If you're chemistry oriented, read the pre-print. It's pretty crazy! I read through it the other day and I was there for every second of the work we did, and I can tell you I'm still scratching my head with some of the stuff. So, it's a nice read to get some background on the technology prior to the Roche acquisition.
So, we talk a lot about flexibility and performance throughout and that's just for everything that we talk about, you're going to want to be thinking about that flexible operation. It's just kind of implied with everything. We'll show examples of the higher accuracy. The high throughput is pretty much there for everything that we're doing. And then today, we're going to show some of that longer read work that we do and of course showing that world record.
So, the foundation of the technology is Stratos Genomics’ SBX chemistry coming together with the genia high throughput array. And, you know, again, you can get a lot of that background information from some of the previous materials.
The AXELIOS system, now we have the pricing, so we know that. What that is, is that synthesis instrument and sequencer, and of course library preparation which Jagdeesh Chandrasekar went through quite a bit of that earlier at the CoLabs. So, we'll cover a little bit of that here. And just a little bit of background for those of you who didn't see this before.
We break up the sequencing library prep work structures in the SBX-D which creates that hairpin Y-adapter structure that allows us to make Xpandomers that we can get intramolecular consensus reads on. So that's where that high accuracy applications, and we're going to talk about most of these today.
But then you've got simplex side. So, for example, target enrichment, some of the short read flex stuff, but then also the longer read stuff, which we're going to spend quite a bit of time on today. So, you can look at structural variance, phasing, isoforms, things like that. Again, all of that being leveraged because of the throughput and the capability of the technology. This kind of shows again another way of looking at our scale from low to high. And the read lengths as well, but then being dissected centrally from single omics, to now some of the multi-omics things that we're talking about here.
OK. And then the wheel here, I think we've done most of these things. There are things that are not on the wheel that should be and will be. And I think we're just going to keep going around the wheel filling things in and showing what's possible with the technology.
So, shifting to early access. So, we did work, started this project. End of August, I believe, we installed the system. I see Mike Quail down there in front of the system looking quite happy. So, we did the install, it worked perfectly, and we decided, OK, well, let's do a data set. So, Emma Davenport had some samples that we could run and were a good fit for SBX-SLR, which is what we're going to cover here with this project. And so, we said, well, let's go for it. Let's see if we can get some data to show at the conference.
So the workflow again, SBX Simplex Longer RNA (SBX-SLR) is pretty straightforward. You can see the poly A portion, so you get the directionality of the structure there. We do a Y adapter ligation, and these are kind of custom Y adapters that bring in the sample ID. And if you were to want UMIs you could bring UMIs for other applications there as well, all compatible with our linear amplification protocol. And so after that you can see the directionally, what the Xpandomer or sequencing reads would look like. And just to put a number, we're looking at 2.5 to four billion reads per hour in that 400-600bp size range. But again, there's a pretty big histogram there. Those are average relinks. So you'll see we get, you know, quite a few because of the throughput. We get a lot of thousand MERS and longer reads out of that.
OK, so melioidosis is a bacterial infection with a mortality rate of 26% where individual response to treatment varies. And the project we're utilizing looking at SBX-SLR to examine differential gene expression, splicing signatures and disease endotypes. So, for the cohort, I think her cohort was about 1000 in size. We took a subset of that, 90 samples broken apart into healthy, survivors, and non survivors for this particular study, as we were short on time, it just wanted to, to give a look at it.
And again, this is the first time we actually tried to run this in the lab at Sanger, and actually anywhere. So, pretty straightforward, intuitive protocol, 200 nanograms bulk RNA, we used an NEBNext kit for the library preparation, brought cDNA into that Y adapter ligation and then went on to the SBX-SLR workflow to generate reads. We over sequenced, of course, on purpose. We wanted to over sequence just to kind of get a sense of what that would look like. So, we got 138 billion reads in 36 hours. I think that's probably on the lower side. I think we got even more room to expand that quite a bit. Average read length around 400. You can get an idea of the histogram right there. Samples were totaling 90 samples, and for each sample we had 1.5 billion reads, or 3.8 billion reads per hour.
So what can you do with that? Let's dive in a little on the biology and Emma will cover this in a lot more detail, but I just want to touch on a few things. You can give a sense from the picture there looking at differentially expressed neutrophil markers for the non survivors, you can start to differentiate, some of the expression there. Just kind of a little snapshot picture. And again, she'll cover a lot of this.
Same thing on genes with differential splicing. Nothing surprising here, you're seeing genes associated with melioidosis, more immune function and regulatory genes. So, really as expected here, including some of the IG, IGH and BCL genes as well. So really as expected.
Similarly, Gene Ontology, the revealed pathways are associated with immune response and pathogen interactions. So, all making sense. Emma will dive into that, not just here. I imagine she's got quite a bit more analysis to do given how many reads that we were able to generate for this project.
And then taking a quick look at BCL6 and some of the isoforms, we had two here in particular, we looked at a 700 and a 3000 base isoform here. And just looking at transcript usage for this particular one, looking at the group of healthy that yield those yellow boxes up there, you're seeing a different transcript usage there. So again, a lot more to come on this, a lot more detail. Just wanted to touch on it say that this is something that we were doing and looking at, and these are the kind of slides for me that get into the detail that I'm really looking for. It’s OK, well, so longer reads enable more and ubiquitous isoformed identification, I think so.
When you do the comparison to the run that was done for the original 1000 sample cohort, the Illumina run, it was using was 37.5 million paired ends per sample. When you compare that to the longer reads, the longer single ended reads at 37.5 million, you can see the difference in the fraction of isoforms detected. When you bring that up to 75 million, which is actually probably a fairer comparison in terms of total read, you see an even higher number. And go to saturation curve looking at 300 million reads, you're getting that 99% so you kind of bottom line it here.
From 138 billion reads that we produce for the study, we could have sequenced the entire 1000 sample cohort in 36 hours at a depth about 125 million per sample. Probably around 98% saturation. First time we ever did it. You know, really happy to see this. And then a lot more to come along these lines. And there's a lot of buttons to push. Everyone's going to want to push that length out a little bit more here. These are blood samples so they're already a little bit on the shorter side, but very typical. So I think there's a lot of room for us to kind of move that length a little bit further, move the throughput, but those are going to be things that we continue to work on.
So, staying in the same space as the SBX-SLR with Aziz Al Kaffaji's lab. We added another project, and Brian Haas will be speaking about this tomorrow. Aziz is at the meeting in Australia, so couldn't be here, but was offered to provide a couple of slides here, kind of give some of his perspective on this.
Alternative splicing is a key regulatory axis of biology that must be resolved by RNA seq assays. And then transcript splicing modulates: protein isoform, translation efficiency, etcetera. And there's just numerous examples of splicing found to be essential in development and drivers of disease so you know, basically driving to that conclusion, splicing variance are deeply under resolved. That's kind of his position on this and why he's really excited. We talk all the time about the kind of things we want to do together in the collaboration.
So, looking at this a little bit more, standard RNA seq is blind to the rich diversity of proteins emanating from the splice forms of these genes, and the question is, can SBX-SLR longer read lengths enable measurement of transcript isoforms at scale? So that's the question. And the study we did for this is using the DepMap cell lines and you get that picture on the right there to kind of show the spread of the different cell lines used here. The overall project again, SBX-SLR measuring differential gene, isoform, and transcript fusion expression across the DepMap cell lines.
Similar to the previous project, around 96 samples, but in this case they had a modified template switch protocol that they developed. These were a bit longer reads than what we did for the previous, and it shows when we look at the yields here. So, again the throughput over 100 billion reads, around 500 in length, and you can see a noticeable shift to the right in the read lengths there. Then the total reads per hour about 2.8 billion, so kind of right in that range of what we'd expect. Again, the first time we ever ran this project.
Just looking a little bit deeper into it, just an example, GAPDH isoforms about 1.3kb, below you have the reference isoforms by the green box there. And looking at the SBX-SLR is able to unambiguously identify that one contiguous read there as an example. But then the shorter reads, would obviously have a very hard time doing that.
Similar to the previous presentation, looking at the unambiguous isoforms identified, looking at both full splice matches and identifiable isoforms, 95% saturation was achieved with about 58 million reads, or 51 million reads for the ISMs.
Again, the main point really here is the workflow flexibility. We talk about it all the time. I think it's a very real thing that I'm looking forward to seeing how people leverage that flexibility, the massive throughput, the ability that we're able to reach into longer read lengths then and as well as occupy the lower shorter read lengths at massive throughput. Flexible throughput I think is really the advantage here. And then one more statement here. With the 2.8 billion reads per hour, the 96 sample study could be run in three hours at a depth of over 100 million reads per sample.
So, we'll see, and I think Brian is obviously going to keep diving into the data and Aziz's team will keep looking at that. We'll keep making adjustments to the chemistry and tune the workflow around. But I think out of the gate, you know, really happy with what we're seeing there.
Now shifting to multi-omics looking at the Simplex Longer DNA (SBX-SLD) side of things, very similar. I mean, I can’t see a whole lot of difference between some of these pictures here. Again, focusing on DNA inputs, 20 to 50 nanograms, I think we can obviously go lower than that, but that's just the range we've tested so far. Same Y adapter ligation compatible with the linear amplification protocol, so very similar to what we showed before. And just one data slide here to show looking at structural variance detection in cancer cell lines.
And again the same hypothesis here that longer reads can identify more supporting evidence for the structural variance. We note here the true positive criteria here for both XOOS, the Roche XOOS suite, as well as DRAGEN that was used for the study and then the read counts for both. So, again, we were actually using less reads than the comparison here in really, in both cases, but then kind of pointing out the one thing that jumps out on the slide would be the insertion impact here. And this is pretty much as expected based on some of the prior papers, as well with longer read comparisons.
So, shifting a little bit back to the RNA casing slide I showed before in terms of the RNA seq, same thing, same workflow. This is now more for benchmarking, same numbers there we showed before, and just kind of getting an idea with the genome in the bottle cell lines, what are we seeing? Are they concordant gene expression? The answer was yes. Transcript expression, the same thing, yes. In comparison, looks very good, identifying some of the more challenging regions to sequence that are with low mapping quality. We're able to see that with the longer SBX reads, we're able to map better there, whereas there very few reads were mapping at MAPQ greater than five, and that showed in the TPM difference on the left here between the two technologies.
OK. And there's Chen down at the bottom here. You can have a poster. I would encourage people to go and walk by and keep Chen company. Ask him some questions about this. He's going to cover several different topics, but again, looking at somatic SV expression and the measurable expression, perceivable DNA variance for the 1395 cell line.
OK. And then the recount again, about 350 million paired end, versus 350 million single end. So again, a very fair comparison here. Then looking at 32 were missed essentially by Illumina versus 12 here. And, of course, I asked Chen last night. I said, OK, well what are we seeing with the 12? Let's fix that, and of course we're going to go right at it and understand that a little bit more. So again, just data that we just generated very recently. Pretty excited about that.
Now shifting to multi-omics, a little bit more on the multi-omic side. Introducing this idea of SBX duplex with methylation, SBX-DM. And for this, our first attempt here, we're using the Watchmaker Genomics TAPS+ Methyl-seq kit to convert 5mC. So just the high-level overview of SBX-D again, using the hairpin Y adapter structure to make the structure that goes into our linear amplification, and then produces the Xpandomer reads as we've shown and talked about many different ways.
OK, but now if you add in this kit, and again, we just got the kit straight up from them, didn't do any modifications to the kit, and apply this to the SBX chemistry and give a little background on the TAPS chemistry, it’s converting five MC or five HMC to T essentially. And one of the key steps is, there’s many, but I think one of the key steps is that reduction step that they were able to optimize. And I can say, I mean, having done really challenging chemistries, they did a great job of pulling this together. 98% direct conversion with low false positive rates. So, I think it's always tricky finding that balance with chemistry, and they did a really good job of that, and shows in the data.
So thinking about the traditional Methyl-seq, the lower complexity, decreasing sequencing complexity, and the challenges that come with that. Versus the TAPS conversion where you're seeing more of a maintaining of the sequencing complexity with only 1-2% change. And then what does that allow us? It allows us to improve the alignment, and deliver epigenomic and genomic variant information simultaneously.
OK. So what does that look like? TAPS conversion protocol kind of show after the library prep and again, this is where we would apply any other method, and we will of course, we'll look at all the other methods and let people decide what they want to use. So in this case, we applied the TAPS chemistry at that point and then through just normal workflow after that.
OK. So SBX-DM as we've said can concurrently detect DNA variants and methylation signal from a single library. OK, so that's, you make the one library, you convert it, and you get all that information in one sequencing run. So after the base calling, we do demux, intramolecular consensus, and then do reference free methylation detection using the power of the duplex read. After the methylation status is recorded, the converted bases are reverted for mapping and alignment and for that you can actually just use any mapping tool because you're bioinformatically reverting those back.
And then you use the Roche XOOS suite for both DNA variant calling and methylation calling status. So really cool, really efficient. And we'll get into a little bit of the benchmarking that we did here.
Multi-Omics - benchmarking SBX-DM with GIAB & Cell lines
OK, so comparing straight up SBX-D versus SBX-DM, looking at the link side, quite comparable. The SBX-DM is a little shorter, but in this particular comparison, but actually had higher coverage. So OK, and that's not like it was so much shorter. That's a big deal there. We see a little difference in F1 scores for both SNV and indel, and this could be both bioinformatically, some tweaks we may want to make, or chemistry, a little bit of both. But we were actually giddy, I think was the word we used in the meeting, seeing how good we did right off the bat with this comparison.
Similar to what we've shown before, about five billion reads in one hour sequencing, giving us way over obviously 30x coverage. And that's using concordant duplex bases only for the coverage, just to be clear on that.
OK, comparison to the AF values between both are you know, on par. Nothing really surprising. This is looking at the two different cell lines shown below there, so really as expected.
Looking at the methylation status, or the level of methylation that we're seeing against several other technologies. Again pretty much in agreement with what we're seeing with, with other technologies. In particular, looking at TruSeq, the histograms look very, very similar there. So that's good.
And then focusing on methylation patterns. So, on the left looking at, you know, the methylation patterns near transcriptional start. We actually used this as an opportunity to test SBX-SLR, to generate, do the gene expression and then generate the high and low categories of expression there. And then took SBX-DM to assess the methylation levels. And then of course applied that to show that the difference that we're seeing between the low and high actually makes sense. And it does.
And similarly, looking at the methylation across different genomics regions pretty much overlapping there. So all good stuff with that.
Now pulling it all together to kind of leverage the power of all three of these approaches here using SBX-SLD (DNA) for the haplotype phasing. Then SBX-DM for co-detection of DNA and methylation variants, and then SBX-SLR (RNA) for gene expression phasing. So what does that look like? And this was our favorite slide. And I can just tell you we all really love this slide. So, focusing on SBX-SLD here. You see the haplotype phasing using the heterozygous SNPs. You can, it's kind of hard for me to see for the picture here, but you can get an idea of the two, the two haplotypes there as they’re hopefully coming through on your screen there.
And then looking at SBX-DM, you see the methylated haplotypes for haplotype one, and then moving on to haplotype two, you can see the unmethylated G to A SNP. You can see the unmethylated haplotype, and you can see the methylated on both haplotypes. So again, you're getting that with the SBX-DM.
And then lastly, SBX-SLR, you're seeing the allele specific expression of the unmethylated copy there that we point out, as well as the Exon-Intron boundary, so really cool. Again, just brand new stuff. Some of this data was just generated within the last few days. So really exciting and we can't wait to carry this on further.
And on point here. So Mahdi will be presenting at AMP. We're going to take this and amp it up a little bit more as we get close to AMP and he'll be presenting on that. I'm actually going to come back to Boston just to see Mahdi present this. So I'm excited for that.
So again, I wanted to get a little bit of a test looking at some FFPE, buffy coat and tumor normal samples, and get a sense of what that looks like. So we just had five samples of breast cancer, bladder, CRC, for all three sample types and we ran them through just to kind of see what we're seeing here. So we did standard SBX-D, looking at tumor informed MRD in the 60 to 90x range. This is something we've done before showing great data on at ESHG. So we were able to pick five out of five subjects, including a very low TMB sample. So that looks great.
Then looking at SBX-DM we were able to see at 30x coverage, and the reason why we did 30x was the yield on this one time we ran SBX-DM, there was a couple that were a little bit low. So we decided to actually down sample both the 30x for the comparison and that's what we did here. And we're able to see four out of five subjects for both, but it wasn't like we saw five out of five in SBX-D. So we saw very similar performance from both at the similar coverage. So we'll work on making sure that we understand if that coverage was just a born drop off or not, but out of the gate, really exciting stuff.
And again the SBX-DM preserves methylation signal as complementary information for MRD detection. And so we wanted to carry that a little bit further looking at the 30x SBX-DM, looking specifically for cancer specific methylation signals. So we started off with the left looking at differentially methylated sites. And to do that we analyzed paired FFPE blood DMS using SBX-DM.
So we got the count number at that, then we intersected that to narrow the population down a bit with the previously identified cancer specific methylation reporters. So it narrowed it down, and then intersected one more time with our cfDNA to get a cancer specific DMS detection in plasma. So again, this was very recent data, but very exciting to see that we could see that, and Mahdi will cover quite a bit more of this at AMP in a few weeks. Some pretty good stuff and again, the complements of the SNP information improve MRD detection, especially in low-TMB is something that we're, you know, really excited to bring both of these things together with the SBX technology.
So, target enrichment. I get a lot of questions about this. I just wanted to cover it here because I think people deserve to understand what we're doing. So this is actually an SBX simplex approach, using a lot of the same Y adapters I've shown before that would then go through a pretty typical Probe Hybridization, PCR and actually we do use PCR pre and post target enrichment for the steps, and then bringing it through Xpandomer sequencing as normal on the sequencer.
But these read lengths are in a range that is just a workhorse range. So, you'll see the number of reads we're able to generate here. But as with most of these target enrichment applications, you're going to have families in clusters that you then use intermolecular consensus to collapse, both with duplex clusters we show on the left, and simplex clusters on the right to get a consensus read for both types. So that's the basic approach and general approach most people use for that.
And then focusing on the right hand there, from two nanograms of cfDNA using the KAPA HyperExome V2 kits, we generated, we would generate in four hours, we'd be able to do about 48 samples at 340x unique coverage.
And the Phred scores on that would be about 45 to 50 for all clusters versus duplex clusters. So quite respectable there. And we're looking at over 184 billion reads in 24 hours. If you did the six runs as I indicate there, or roughly 288 samples in 24 hours at 340x. So, if you were more interested in germline 30x, you'd be almost 3000 you'd be able to do in 24 hours. And the actual read counts is probably a quarter trillion and I had to get that in just so I could say the word trillion! So just I'm just giving that one up.
Which I'm very excited to keep pushing towards that kind of number of throughput. So anyway, that's target enrichment. I'm really happy to get this into some hands with early access, and see where you go on that.
So just a little jaunt to FFPE and Jagdeesh covered this in his as well, but we do FFPE with a little DNA repair step, and we’re finding that that really helps with the quality of our sequencing there. And then run it through the process very similar to how we’ve covered before. In this particular study we did 18 matched tumor normal samples, across a range of qualities, and what we saw, and the experimental details are on the right, but essentially, normalized 100 nanograms for each, across both technologies, only greater than 70x. They’re all pretty well matched in terms of the coverage, so there was no advantage there either direction, so around 70x were shown, and what we see is really good per base accuracy, really good error rate by substitution type, and really good homopolymer accuracy for the blue SBX, so really happy with that.
This is a poster, and Mahdi will go into more details looking at concordance again against Illumina. Basically the take home is, highly concordant. He'll go through this in more detail to go through the examples if you visit him at the poster, he’s quite exceptional. So I would recommend people go and sit and talk with Mahdi as well.
OK, so MRD, I showed a version of this at ESHG. We've just added more samples to the data set here. It's actually a 96 sample set, so expanded a little bit. Maybe a little bit more bottom of the tube samples here. So, a little bit more challenging, but carrying this through an MRD, SBX-D workflow, and again, 96 samples.
What we saw here was we were able to detect 41 out of 47 MRD samples called correctly. I think this is actually quite an impressive outcome here for this result. And I think it'll be a lot more of this coming up in the coming months. And Kendall there on the right, she'll be covering this in her poster, so I invite people to go see Kendall. Kendall's a key member of our biochem team and does a lot of the things that make us able to make these Xpandomers. So, I encourage people to go visit Kendall. She'll do an overview of the technology there as well and answer questions.
OK, the big finish and we'll hand off to Katie soon after this. So, doing SBX-Fast is essentially a PCR amplification, linear amplification-free workflow, as we say here, running through the SBX-D protocol. And we've talked about this at several meetings already, but essentially the idea is, how can you quickly go through and get a single genome or a trio genome? And so just a little bit of a snapshot here. We've done quite a bit of work on different samples over many months and essentially were able to identify a number of different types of variants, in a number of different samples. This is previous data, we added a bunch more. All of them we’re able to identify correctly.
OK. So, the big number, so we now have sequenced a genome from sample, and this is an HG002 sample from DNA sample through to VCF in three hours and 59 minutes. And we've done this many, many times as Katie will show. So really, really excited. And we're not just doing this as a vanity thing. There's a lot of really good reasons to want to be able to do this as Katie will cover, and so really exciting to be able to see this type of result, and the impact that's going to have.
And I think there's a lot of other things, again, focusing on the flexibility of SBX that we're going to be able to do. This is a project that we started talking about last summer, working on in earnest in November, and the teams really came together. It's a fantastic group of people that we work with to make this result, to be able to demonstrate this. And so, I think pretty exciting there. So, three hours and 59 minutes.
This kind of breaks down a little bit of the processing steps there, so you can get an idea of the timing and Jagdeesh will have a slide, a poster that he'll go through and talk about some of the SBX-Fast work there. And as I mentioned before, throughout, we’ve got a nice, a great presentation tomorrow. I encourage everybody to go to look at some of this SBX-SLR work. Both Brian and Emma will be covering, and Yutaka Suzuki will also be covering some of the spatial work. And I think it's going to be a great demonstration of the things you can do with SBX. I encourage people to go to that.
And so, the last thing here. So, I can remember when I got excited by seeing a single X-NTP extend off the end of a primer. And I actually, you know, that was 2014. So, it was seven years just to get to that point. And so, I think the one last thing I’d like to finish on, the message is, essentially this is pretty hard work that we do here. And you know, everybody has their fears, fear of failure, fear of not getting money, not being able to fund your work or run your projects. And I think the thing I've learned over all the years of what we've done for SBX is you've got to be able to turn that into creativity and find the grit to keep going.
Because I can think about so many different reasons why it would have been so much easier just to kind of not try and solve all these problems. But I've had the great fortune to work with wonderful people, and we figured things out, and we've carried on, and we've pushed forward to keep inventing and using that energy to drive things forward. And I think that’s the message is, we've got a lot of challenges in front of us for all of our work that we have to do. And I think we have to persist and keep pushing through and keep innovating in all the work we're doing. And I hope that SBX can help contribute to that and help people with their projects and move sequencing into a new era. And I believe firmly that we owe that to the people whose shoulders that we are standing on now, the brilliant people that we have had the opportunity to learn from, as well as the people who are the next generation, who we are modelling and showing the way for.
So again, I'm really excited about being able to be here, the things we've been able to do and just to be able to talk to the team here. So, with that, I thank you very much.
Overview of AXELIOS sequencing system and research applications in whole genome sequencing.
*This transcript was generated using an AI-based transcription tool and may contain errors. It reflects the spoken content of a live webinar and has been lightly edited for clarity.
Mitu Chaudhary: Welcome, everybody. Good afternoon. I will start with a brief introduction for myself, Mitu Chaudhary. I am a Senior Director, International Business Leader for Sequencing Systems at Roche. And today, with my colleague Jagddeesh, who will introduce himself, best to come on.
We talk a little bit about SBX technology, some of the just a very brief overview of the AXELIOS platform. And we're going to talk a little bit more about SBX Duplex in the context of whole genome sequencing. There is a lot in SBX that we are actually going to cover over the next two days. And we'll talk about some of our posters as well as workshops as well.
So before I go in a little bit more into the details of SBX, what I wanted to talk about is just from a sequencing perspective within Roche, we have tools and technologies that we have been implementing over the past several years.
This is primarily in our KAPPA portfolio with very tools in library prep and target enrichment that spans DNA, RNA, and target enrichment. And we do have some software pieces as well with our navify portfolio. And then what we have done is taken these tools and technologies and developed assays, for example, the AVENIO Comprehensive Genomic Profiling Kit, which is also automated on our Walk-Away platform, that's AVENIO Edge. But really, the vision here is to fill that gap in terms of sequencing with the very flexible and very capable SBX technology, which is kind of our focus for today as well as tomorrow, and really building this ecosystem.
Now, the thought process here is the tools and technologies that we are developing are very capable in a standalone fashion. And you could use them in the application area of your choice. But then how do we actually bring all of these things together and form a clinical solution going into the future?
And so this is kind of how the vision is evolving. In each of these areas, whether in sample prep, in sequencing, and analytics, you have very capable tools that fit right into your lab with the flexibility and the capability of the technology. But then really bringing these pieces together and developing assay solutions over a period of time.
And these assay solutions could look like, for example, in the genetic space, but also in oncology with applications like MRD. We're actually going to talk quite a bit about each of those in this presentation, also in our workshop at 3 o'clock, and another workshop tomorrow at 12 o'clock.
The other piece I want to draw your attention towards is this QR code that will actually take you to our website. Given the time we have and the information that we have, we really cannot cover all of that. So if you want more background information specifically, I would highly recommend to go onto that website.
We started with the technology introduction back in February. There is a webinar in which my colleagues, Mark Kokoris, who is a co-inventor of SBX, as is John Mannion, who heads our computational biology team, provided a very good overview of the technology and also had the seminal paper that came out in BioArchive at that same time.
We also have presentations from AGBT together with our early collaborators from Hartwig Medical Foundation, the Broad Institute. And there are two groups at Broad Institute that have collaborated with us, both on the germline space as well as in the single cell area. So those presentations are on our website.
At ESSG, we actually presented even more up-to-date information on these workflows together again with Hartwig Medical Foundation and the Broad Clinical Institute. And one of the highlights that we presented there was what we call a fast workflow, and Jagdeep will talk a little bit more about that, where the team actually showed DNA to VCF in under five hours.
More recently, we held a webinar back in September. Therein, we actually have given quite a bit of more details about the SBX Duplex reads characteristics as well as data structure. And there is a link here, and if you go to the website, you can see that.
The other thing associated with that webinar is a data release that actually has edges 001 to 7. So it's seven genomes in one hour of sequencing with more than 30x duplex coverage. So that data set has been released as well together with small variant calling with what we call our Zeus analysis pipeline.
So the small variant calling is there. There are also two technical papers associated with the presentation that actually gives a little bit more detail on the duplex read structure and how the Google Deep Variant team has also adapted Deep Variant for SBX data type. So I would highly encourage you to check out these resources.
For whatever reason, and I do not recommend missing it, but if you do miss our presentations later today and tomorrow, then all of these workshops will be posted on the same link that you see here in about two to three weeks' time.
I'll give a very brief overview of SBX. A lot of this information is already contained on the presentations that I was talking about on the website. But very briefly, the SBX technology is built around a single concept of flexibility. And we will talk a lot more about flexibility over the next two days.
But then also thinking about how do we then scale this into the future? How do you adapt something like that in your current operational lab? There's a lot to think about. We are still thinking about it. Everybody has been used to doing sequencing in a very specific way. SBX is very different, and we'll talk a little bit about that.
But then how do you actually adopt in their lab? You will start seeing some of that from our early collaborators in the presentation today as well as tomorrow. So I'll start with accuracy, which is something that's just table stakes. You need that. And what we have shown is very high accuracy.
And again, on small variant calling, extremely high performance. And this is in a method we use linear amplification. But you can see that the variant calling performance is actually approaching what we call currently gold standard with PCR-free approaches.
But where we see the main difference with SBX technology is the flexibility that it allows you to actually build your workflows based on your experimental needs. And what you can do is really think about the accuracy, throughput, as well as the time.
And there are several aspects that you can actually adjust based on your needs. And Jagdeep will talk about those workflows in his presentation. From a throughput perspective, as I just mentioned, we did a data release that's seven genomes with more than 30x coverage across each genome in just one hour of sequencing. There are two modes of operation.
Today's presentation will talk about SBX Duplex, which is our high accuracy, high throughput method. And we typically see the insert sizes in about 150 to 300 plus range. And there is another method called Simplex method, which can actually go much longer than that. And that method is our, what I call very, very high throughput.
So there are two modes on SBX. There's a high throughput mode, and then there's an ultra high throughput mode. Time to result, as I mentioned, this was shown by our collaborators at the Broad Clinical Lab in less than five hours. And then the sensor module, which is kind of the chip that we use in our sequencer, is also reusable.
And we'll have a lot of those details that are on the resources that I mentioned on our website.
The combination of two technologies has resulted into SBX. On the left-hand side, the first panel here is the base chemistry of sequencing by expansion. As I mentioned, it was co-invented by the presenter of the webinar initially, Mark Kokoris, and Stratus Genomics that Roche acquired.
It was a very simple fundamental. If you want to measure in a nanopore, don't measure the DNA because the bases in DNA are very tightly packed with each other, so you can't resolve them very confidently. So the DNA information is captured in these reporter codes that are 50 times longer than the original molecule.
So that's kind of the baseline here. This process is called an expandomer. And the expandomers are generated in an automated fashion called a synthesis system. And you can pull multiple libraries into one expandomer and make four expandomers at a time. I would encourage you to visit our booth. We do have the systems there, and we can talk a little bit more about the operation of these instruments.
Once you do that, then we do single molecule measurement using a sensor module, which is a nanopore-based method. This was a technology that Roche acquired from Genia back in 2014. This is a very, very high throughput chip with 8 million microwells. And this is a sensor module that is reusable. And the primary reason for that is that in every run, the bilayer and the pores are regenerated. And also, we're not measuring DNA. We're measuring expandomers, which are very large complex molecules, and they don't want to stay in the wells.
So there is a stringent process through which all of the bilayer, the pores, and the expandomers are washed away. The combination of these results into very clear, discrete signals. There is a lot more detail on the expandomer, which we don't have the time really to go into today.
But I highly encourage you to read the paper as well as the webinar to understand how some of the mechanics of reading the expandomer through a nanopore actually works.
This is the solution. You can again see it in our booth.
So you start with your multiplex and index libraries. Every pool of library can be loaded onto the synthesis instrument. Up to four expandomers can be created. And then each expandomer is approximately a four-hour run on the sequencer.
But the beauty of this operation is that you can actually adjust the timing of sequencing based on the samples you have. So you're not limited by really filling the flow cells, so to speak, which is typically how you think about sequencing.
Here is just some of the details. So this is our booth. The instruments are there, the AXELIOS instruments. And we do have a presentation today in room 153 from 3:00 to 4:00. There's a lot there. There's RNA, there's DNA, there's multi-omics with methylation. So if you are interested in that, please do come by.
And then we have another one tomorrow where three of our other collaborators will also have a presentation. We are currently in early access with a commercial launch planned for 2026. With that, I will hand it over to Jagdeep.
Jagdeesh Chandrasekar: Thanks, Mitu. So I'm Jagdeesh. I'm the director of advanced SBX applications. I've been with Stratus Genomics since 2016, where Mark Kokoris, he hired me as a young graduate student. So this is pretty exciting to see how far this has come. So I'm really excited to talk about SBX Duplex or SBXD workflow.
But I want to take a step back and talk about the two different SBX library structures that Mitu was alluding to earlier. So you have SBX Duplex over here, but you also have Simplex, which is the ultra or super high throughput type reads. So in Duplex, if you have a double-stranded DNA, you can connect those two with a hairpin. And when you sequence them, then you get read one, hairpin, and read two in a single read. So that's very high throughput. And you also get high accuracy because bioinformatically, you can do consensus or error correction.
And that's how you get high throughput, high accuracy sequencing reads. The read lengths for those vary between 200 to 300 base pairs. And the applications for those would be germline, whole genome sequencing, of course. But then somatic oncology, where you're looking at low allele fractions or complex mutations, that's where you really want to have high accuracy.
FFP, where your samples are heavily damaged, getting Duplex high accuracy reads can be invaluable. MRD, which is cell-free DNA and like looking for tumor fractions, low tumor fractions, you want to be able to sequence as high accuracy as possible to get like the best signal to noise.
Then there are other applications, methylation and rapid whole genome sequencing, which Mark Kokoris, head of SBX technology, will be talking about it more, like in the upcoming presentation at 3 o'clock. On the other side, Simplex, this is ultra high throughput. So these could be very powerful for high-counting applications such as single cell, transcriptome, gene expression, then proteomics, fragmentomics, spatial, so on and so forth. But you can also start pushing on the length, so 500 to 1500 base pair. So this is where you can start seeing structural variants, phasing variants, and then RNA isoform detection.
Now shown here is like a very generic SBX Duplex workflow. So if you start with like a 20 to 100 nanograms of intact genomic DNA, you can our first step involves fragmentation to generate medium-sized double-stranded DNA. The platform is flexible for both mechanical and enzymatic, but it's really optimized for enzymatic fragmentation. So our very first step, important, it involves adding an SID and Y adapter, which is shown over here. So this is the sample ID. And we tie in the sample ID with the sample at the very first step.
Now this is powerful because if you try and like bring SID in like a later stage, that's a contamination risk through PCR if you're trying to do that, as opposed to bringing SID pretty early on in the library prep. Another key feature of this workflow is linear amplification. Again, if you're doing PCRs, you can get some early copying errors that can get baked in into your library. And those can continue to amplify or propagate as you're doing more and more cycles. But linear amplification, each of these amplicons are just one step removed from the original library. So these are direct copies coming out of the original library molecule.
So the errors, they do not propagate. So these are like very less error copy error molecules. There is very low error in these copies. So the linear amplicons are then fed into the expandomer synthesis instrument. And this gets translated from these DNA molecules to expandomer molecules, which are like the polymers.That then is like fed into the expandomer sequencing instrument. And then you get your sequencing information.
So for SBXD read categories, we have two categories, which you end up like from the data side. So you can have the full-length reads, which is like shown over here. So this contains read one, hairpin, and a full-length read two. So if you fold those two reads on top of each other, you get like a consensus.
And you can get these duplex bases, which are high quality. Now there is another category, which is shown above. So these are the partial duplex reads. So you can get a full-length read one. Then you get a hairpin and a partial read two. So when you fold these two structures on top of each other, you get partial duplex bases.
But you also get these simplex bases. And they're still valuable reads because you can use them for mapping. But they are lower quality compared to these duplex bases, which will be the higher quality. And we leverage both of those information. So shown here is the SBXD variant calling performance.
The first thing that really pops out is in one hour of sequencing, we get 5.1 billion duplex reads. Those are very remarkable numbers. And that translates to over 30x concordant duplex for seven genomes under another. And those seven genomes being in this example and this standard genome in a bottle reference.
So shown here is the mean insert read length, which is between 240 to 250. And then the median coverage, which is very uniformly greater than 30x. Now using GATK plus Roche machine learning, we generated the F1 scores. And these were very respectable SNV and indole scores.
I do want to point out that these are amplified molecules. So these are, again, very respectable SNV and indole scores. Google Deep variant, they took our data, and they performed their own analysis. And they got very similar F1 scores as well. And in fact, it's slightly higher. So as Meetu was pointing earlier, we have had a very collaborative effort. And we have shared this data publicly. So I would highly recommend you to refer to the data webinar.
So now SBXD using FFP samples. So these are more challenging samples. They get damaged because of formaldehyde and parafilm fixing. So you can see it accrues a lot of damages. So one of our steps, it involves cleaning up or reverting these damages. So they won't go away completely, but you kind of do the best possible cleanup.
And then you take it through our duplex sequencing process. So you put on the hairpin, which is like the SID, the Y adapter. Then you go through the duplex library prep. So some of these mutations, they can still trickle through,right? But then you proceed through linear amplification. And then you get like your error, which is like shown at the bottom.
But bioinformatically, you can look at these asymmetric errors and either mask them or like assign them low Q score. So that's again like showcasing the power of duplex sequencing using SBXD.
Our inputs are from 100 to 300 nanograms of genomic DNA for FFP just because you typically get more samples. And it works pretty well for this workflow. For minimum residual disease or like cell-free DNA, as you're all aware, this doesn't require any fragmentation.
So you start with 2.5 to 10 nanograms of input. You put on the SID and the Y adapter. Then it goes through the linear amplification process. Again, very similar workflow. It fits in very nice. The one thing I want to point out again is the introduction of SID pretty early on and linear amplification. These really help minimize the noise. So these data, they start looking, it starts adding up for MRD.
Now I'll switch gears and talk about SBX fast. Now if you're in a scenario where you have DNA and DNA time is of the essence, you need DNA to VCF calls, you do not have time or like the luxury of waiting for 48 hours, and you cannot plex 96 samples or 48 or like even eight samples, this is where the flexibility of SBX technology starts showing up. That is flexibility and adaptability to do lower plexing and get DNA to VCF as soon as possible. So as shown in this case, again, I do want to call out the amplification-free workflow because we are trying to save time.
So the input, we typically go with two micrograms of DNA. Could either do a solo or a trio. There is flexibility. Solo would use one sample ID, but trio would use three different sample IDs. You put on these adapters, you multiplex them if you have more samples, and then it goes into the synthesizer, then the sequencer, and then you end up getting VCF files within the same day, which you could then upload to the tertiary server. So again, this really showcases the flexibility and speed of SBX workflow and the AXELIOS system.
So in one hour of sequencing, again, this is fast. So the numbers are a bit different. But with one hour of sequencing, you get 3.4 billion duplex reads, again, very high. And that translates to 50x concordant duplex coverage if you run it as a trio. And we just did some benchmarking with CEF trio, Ashkenazi trio, Chinese trio.
So these are all like very recognizable NISP standards. So we ran them as three plexus. So we ended up getting greater than 50x coverage for these trios. Now, if you just wanted to do one sample, that would be less than 20 minutes sequencing. So this just shows the high throughput capability of the SBX instrument.
Here are the mean insert read lengths nearing 200, median coverage, very uniform over 50 for running it as a trio and one hour of sequencing. And then here are the F1 scores, SNV and indoles using the Roche-Zeuss suite, and then very concordant result with a Google Deep variant.
So we wanted to expand on this and take it to more see if we can start identifying pathogenic variants. So these are chorial cell lines, which is like shown over here. And then here are the descriptions. And here are the different variant types, small CNV, large CNV, SNVs, and repeat expansions.
So using SBX fast, we are able to call internally, we are able to call all these variants, which is like shown over here. But then Broad Institute, they were able to replicate our experiments. And using our instruments, they were still able to call all these like variants. And again, I just want to point out that all variants were called using the Roche XOOS variant callers.
We further expanded on those.
We have generated VCF files for these different chorial sample IDs, which is like shown over there. Here are the descriptions, again, ranging from small CNV, large CNV, repeat expansions, SNV, and indoles. And we hope to be able to share this data publicly at some point.
So wrapping up, I talked about SBXD. So this is a duplex-based linear amplification workflow. It's very flexible, 20 to 100 nanograms of genomic DNA input, like whole genomic DNA input, 100 to 300 nanograms of FFP for MRD, 2.5 to 10 nanograms of cell-free DNA.
And then, of course, you can push it to a question that we get asked regularly is, can you use PCR products? And we absolutely can. Just have to be cognizant of length, but very doable. And it's very flexible. So the workflow is three to four hours, which is like shown over here, and adding amplification time.
So leveraging the AXELIOS library prep and Kappa chemistry. The difference with SBX fast, which is like the amplification-free workflow, the input is two nanograms. And the library prep for that is two to 2.5 hours for solos and trios.
So the future application space for SBX sequencing, it's very promising. It's very exciting. We are excited to see what the technology can do. And the hope is it will broadly enable many sequencing applications. So I wouldn't belabor you with all these different applications, which is like shown over there. I do want you to take,
I want to take you to probably the most important slide of my talk. So Mark Kokoris, he's going to be presenting on advances in SBX, which is like multi-omics methylation mapping, oncology research. And then Katie, she's going to be from Broad, is going to be talking about ultra-rapid genome sequencing.
So we are very excited. Please do attend the industry education session from three to four. Highly recommended. Then tomorrow we'll have four posters, myself and my colleagues, 2:30 to 4:30. Please stop by, say hi, and please do visit us at the Roche booth. Thank you for your patience. Thank you for your time.
SBX technology for single cell R and spatial analysis
*This transcript was generated using an AI-based transcription tool and may contain errors. It reflects the spoken content of a live webinar and has been lightly edited for clarity.
So. And thank you for having me. I'm totally honored. But I guess the only reason why I'm here is just that there are demonstrating there's this test technology is widespread. Finally to the other side of the planet. So yeah. And so and, okay. So and I'll talk about the application of SBX for this of the analysis. There are some nuances, but the mostly for the Japanese researchers to, you know, make sure to utilize our core facility.
I'm from the core of the University of Tokyo. And these days we are doing some this, spatial analysis, mainly using Xenium as a platform. And, to my understanding that we have experienced the rapid shift from the sequence based spatial analysis to the hybridization based ones. And this is kind of an, a photo album, photo ev album, photo album, you know, things and they, we started Visium and we have spent a quite happy years and two, three years.
And that us as an image is quite powerful to identify the cells their interactions cancer cell immune cell interactions and the data is free from the sequencing information of the patients. So we feel easier to use, for the data sharing. And actually the data, you know, the procedure is quite robust there so that we can start the cancer genome cataloging project.
Maybe it may be that the same thing is starting from a US and UK and many countries, but the way, finally, of analyzing the accounts cancer, you know, at last seeing apart from in collaboration with the Australian and the Thai and the Taiwan and the Korean people. So anyway, that's a good thing. And about the utmost we have learned restarted the learn this the limit of that Xenium analysis hybridization based platform spatial analysis.
It's good to confirm in a sense what's happening in the already known networks or the gene expressions and and that happening in the spatial context. So in other words, we are just doing that collect the, the putting the pieces into the spatial and layer. And so that's, it's now all this easy to, say something new, anything new about this totally unknown things like a novel cancers.
And by drawing your data driven networks and landscape type like things. So in the sequence based has its own advantage because it's sequence based, after all, so non-biased. And, if we can do the old translational sequencing, that should include TCR, BCR sequencing in that, as we heard, the splice variations and allele expression may be captured. At the same time, in addition to human transcription transcripts, we may be able to analyze the bacterial and other pathogen information.
And so it's applicable to other organisms. Where the probe design is not always possible. So but the one of the highest barriers to prohibit this sequence application is that the sequencing platform has not been enough in terms of the sequencing depth and read length. So and the actually I recommend that this is a version of a saturation curve and how many genes are represented in a single cell.
And then they were recommended the use of up, up most at minimum, at least 200 or 300 million base. But obviously we tried to increase this, set the sequence depth up to five 5000, 0.5 billion of 500,000, right? None in the five. Okay. So anyway, so again. Okay. Five fold. And, that medium raise and we found that the sequencing is, coming to the saturated at this sequencing depth.
So that's, ten times more than, 10x Genomics recommend that recommended about, ten x. So and there we tried a firstly, we tried this so we can, you know, make a better use of the sequencing platform starting from the Xenium, actually, because the, simply because the Xenium is my, our own, you know, the main plot from this phase and the commercially available.
Well, you started the sequencing analysis of this HBV induced liver cancer, and I smell it actually. So they we, back so before, you know, going into this, this, slice of this, slice contain 200,000 cells, liver cancer. And instead of using this fragmented library, the final products which we are using are usually using for the special analysis instead of using this, we, try to make use of the first amplicon representing the full length cDNA, even though its, read length is just there, say, the one K or slightly less than one K, because of the limit of the, you know, amplicon amplification, from the standard, Visium protocol.
So and we firstly compare the expression levels are precisely represented in the long library. And by comparing the short read and long read and profiling that's the consistent say was almost perfect. So that's that then. Don't know why. And we also found out that this is the distribution of the human gene human transcript transcripts. And the KB and the PB are here.
This is liver and we also found that in the same library, we could detect. So and, virus transcripts are, you know, they're preferentially found at the zone street. So in the in they're almost a full length. So that's showing that in HBV only 15 predominantly are this region. So that's that's the one that they when we looked at some specific genes or minor express genes in the signal cells.
And we can yeah. It's a kind of unexpected, kind of expected. And the coverage seems to be too shallow. And even now the five, the sequence of the about 500 million cell and they really need it. There's a better sequencing platform and then I was introduced the SBX was launching the new sequencing platform and we send out the libraries.
Firstly, at Visium library, the pro based one and then two, three prime and libraries for a couple libraries to Roche. And the data was returned. And from that we heard that's not so data DNA sequences were included there. And in one number and they said this is everything was done in Seattle. And so each library generated 15 billion reads.
So that's a big 100, even more 100 times higher than, say, the usual Illumina reads expected from a 10x analysis. And we compared the spatial distribution. And this is the Illumina 5 billion rates, which I said are quite big number. But this is the Downsampled 5 billion rays from Roche. The results are all humble. All of us are.
Since we actually left the results in the. This is a real one and five billions and compared a web in comparison where they say 12 billion reads. So the results were the aggregated bulk coverage is still this. The consistency was almost 100%. That's great. So and this is a sequential saturation. And of course the sequencing saturation levels were dependent on the libraries complex all the complexity of the library.
But the finally it reached to a plateau after, say, 6 or 8, billion rates. So the sequencing depth that depth is that's deep, deep depth is needed for the analysis. And at the same time, we are also, checking and leaks. I found that the sequencing fidelity is no worse than Illumina. So 99 8% for Illumina and 99 7% are slightly worse, but they are almost the same level.
So and this outside was outside. Then there's three prime end library the same the results and this is, you know, we also had to do the sequencing for the assay, the adapters. So the sequencing the reads in things was slightly shorter than one K. But anyway, so essentially the same results. And so Darvish and I mean, sequencing our quality and the sequencing read so and we compare the results with Illumina and SBX and the pollen there's the spatial distributions are almost all spatial expression analysis could be done by using SBX as well.
And we should reach the same conclusion with the previous speaker. There's the. So we had to do the first barrier we encountered was the we had to do with the 1 billion base in the sequence. Aware right. So and that's a refund, which was a software errors. And we found that the Minimap2 was the only reasonable choice.
So and the Giraffe was is is more in a quick quicker. But they could not separate internal exons in a, in a precise model. Anyway, we could do the initial mapping for the ten b or even 10 billion risks by putting all our computer resources to this, you know, analysis for a couple of weeks. And this is a results at the HRAS splice variant.
And then just to know that status is a relatively short messenger, all the you know, the 1.2 KB is messenger RNA length and, and there are several splicing variant, as the previous speaker mentioned. And this is the one number is one. And then no the other thing is there. So we could start with the 360,000 res per splicing variants which is a big number.
Right. So and we usually we had to start where the 1000 times less the lower, you know, the smaller number of the rays. So then we may we it's likely that we miss those in minor transcripts anyway. We we could map this, the spatial location of this, each to get them to, to transcript to this spatial layer.
So I'm not sure whether the there's any biological relevance of clinical mouse in this, you know detection of the splicing variant. But anyway we can now analyze these kinds of splicing variants in the in the cancer cell. And the also that we and this is a the all time thing promoter is also this case and this is the say the same gene and we mapped we could detect say and there one almost 1 million reads per this gene.
And in there they are there is another alternative splice variants, which represents a minor, you know, transcripts. If we had to start with the one, free order a smaller number of the, you know, the, the library size. And we may be able to miss, we may, may have did miss this transcom. So the spatial mapping showed up like this.
So another thing is obviously as expected. One is this we were able to the said detect, transcript. And that we were able to detect best price and resource and piece and the for example, we did this genome sequencing, whole genome sequencing using that Visium. And the G was the dominant. It ran we looked at the genomic level mutation and the in some cells, not a not a large number of cells, but the, some cells that minor allele was dominant the transcribed and in a, in a in a certain layer.
And the same is true for the five prime UTR and three prime UTR is in VS. And also there's some deletions. So we we're now in for the there are some limitations. We have to narrow it. We have to focus on I say relatively it's a short messenger RNAs, but the still we were able to detect.
So those kinds of mutations and perhaps the biggest, one of the biggest issues is, is the TCR and the BCR B-cell receptor, immunoglobulin and the TCRs. And the good thing is there's the messenger RNA length is just a 1.1 K or a somewhat that is some somewhere around there. So that this is the TCR. This is a BCR.
So on the in the same library, we you know, the look, the app we, looks for the TCR, BCR sequences, the is and we found that the quite large number of the cells, we could detect the T-cell sequence, exact sequences are from a quite large number of cells. And this is the say the was 12,000 TCR BCR.
And this is the other the cell at the Caso, the two genes. And they're building and you know cancers and these are the TCR, BCL and the cell and the distribution of the t t as are far beta. And they are in a growing HG and IGA and a top that the most frequent, you know, the, clonal types I showed up like.
That's right. So and there are we found there are lots of still cubbies like this, that we, I, you know, examined and found this division AMD in these busy MD. The cell segmentation is still a problematic. I don't like this because they don't make use of this cell surface marker for the precise cell segmentation. That's one thing.
Another thing is when we go T cell, the billions of these tens of billions were is and we, detected some leaky RNA leaks, so-called ambient RNA. So which was not a problem when we you know, scratched in the playground, in the shallow in the field. So that's a problem. But anyway, that's it's really that we could detect, I think, to detect this in a could b cell things and the so the, the cell and TCR BCR is a kind of an on the, yeah.
As usual thing as expected thing. But they when we try to go further into details like the transcription factors in the immune cells, minor cells, and we found that the sequencing complexity itself is not enough for T to do with this, the ultra deep sequencing. And initially it was concerned that say this is the Visium library and this is a 3' Visium library, the sequence based and probe based.
And we compare the complexity and the found. That's the Visium and the three three ply makes this difference ten times more complexity than even, from Visium. And when we think about the complexity of the library and the perhaps the best performing platform procedure is the Chromium class single cell cell dissociation based ones. And the actually, we asked for the additional sequencing to Roche to do the usual conditional, 12 hold on cell Chromium blood cell library.
And we I was obviously almost lost are we were able to detect the TCR BCR things within the library and we our labs are rather interested in the sequence library complexity. And we found this the source complexity was this is the level of the three prime and ten times higher. Right. And the each cell representing say one 1500 or the 2100 genes.
So ten times higher. So and the correlate we started with the 10x Chromium because that's the easiest platform. And the currently we learn that the five temp medium level, medium cell, millions of cells level in a single cell isolation is possible for on the cell. It's the association based like a process. And the has scale bigrams or things.
And, we and this is, from the usual secret single cell sequencing library that they, we can, you know, we can expect. So, the style that's, of southern gene level folder complexity parcel. I mean, so, there are similar plot Tom was still sticking to, I can say, sticking to the cell dissociation. Perhaps the representative one is there, Curio Seeker just by Takara Japanese company.
So by chance and their cell and that we are either. Yeah, yeah. Very well. Request. I want to request that you they're asked for it to do the sequencing for the, so Chromium based once dense and cell dissociation based ones. And the good thing is there's the. So ATAC-seq, special ATAC-seq is enabled for this era at a Curio Seeker the platform cell that we can expect both the RNA sequencing and ATAC-seq at the older, deep sequencing, level.
So I'm sorry for the messy way of the speaking, but the we are so much excited about the launch of the book. They're looking at the initial and or that batch of the data produced by, SBX Roche, and everything is so new and there are lots of things, we have to do with silk.
And they perhaps. And unlike the giant places like the previous speakers. Yeah. Broad order. Sanger. My place is, you know, mid-sized, but can be more flexible. And the more place is open for any kinds of international collaborations. So. And this is my last slides. And the first of all, I would like to thank all the staff from that Roche for their dedicated supports.
It. Yeah I'm not at the position to say thank you. And we I believe I hope at least they are feeling like we are already a team, right. Heading for the same direction and sail and the internet without the you know. Yeah. I really appreciate their support and having a, you know, allow me to better access to the Roche and I think that's it from my talk.
And if any of you are interested in any parts of my presentation, please feel free to contact me at this email address. [email protected]. And so that is my talk. And I have I've stopped my talk here and I'm happy to take any questions. Thank you very much for your kind attention.
Characterization of cancer transcriptomes and fusion isoforms using Roche SBX technology
*This transcript was generated using an AI-based transcription tool and may contain errors. It reflects the spoken content of a live webinar and has been lightly edited for clarity.
All right. Thanks. First, I want to just thank, I think, Roche for giving the opportunity to be one of the first to look at this exciting technology for, transcriptomes. It's it's just incredible. So I've worked in, broad clinical labs, and there we leverage, sequence technologies for, biomedical applications. This includes, cancer research, cancer clinical diagnostics.
And for this, we find that RNA seq is incredibly powerful tool. It gives us a variety of different, genetic and functional readouts and, we decided to explore, sequencing bulk transcriptome data from cancer cell lines as a way of exploring SBX RNA seq. We worked in collaboration with the DepMap group at the Broad Institute.
We selected 96 different cell lines, and we prepped the cDNAs and targeted the instrument for, for sequencing. We we copied the cDNA into the xpandomers and then, sequenced the xpandomers, through the pores, reading out the reporters attached to each of the, xpandomer nucleotides. And, we one of the things I mentioned here is that during the library prep, we had to stop where we had to ligate, SBX specific Y-adapters, to the cDNA prior to doing this, linear amplification step, and then do an xpandomer synthesis in the sequencing.
And the reason why this is important is because we're actually sequencing both strands of the cDNA, and we're sequencing it from the very ends of the molecules. All together. We sequenced 102 billion reads. This is about 1 billion reads per cell line. And it was done in the course of 36 hours. Like this is 2.8 billion reads per hour, which is just pretty amazing.
96% of the reads align to the genome. Quite well. We showed at the top here is the the distribution of alignment lengths across all the 96 cell lines. You can see that they're very consistent. The mean and the median, alignment length was around 500 base pairs. And you can see from this distribution most of the reads that we're getting, we can we consider it like medium, length reads.
There are quite a few reads that were, considered going into like, longer read territory. So more than one KB, 4% of them were more than one KB. Peeking out at around two KB. Now, all these reads for you. So you could use them all for gene expression analysis, but the longer reads are super useful for doing isoform specific expression and also and and annotations and and we'll see how we use those.
One thing to mention here is that the since we sequenced. So we made this decision to sequence initially, from the ends of the molecule. So as you can see down here below, the coverage distribution reflects that. So for the shorter to medium range, transcripts, we're getting really good coverage across the entire molecule. Or for the really longer transcripts, really just going to see sequencing coverage, towards the ends and then sort of bleeding into the, the center.
It is still has some, some impacts, as we'll see. As far as the read accuracy is concerned. These are simplex reads. So the expectation is that they're going to be around Q20 a little bit higher than Q20. Here we're looking at just the alignment differences. What's that sequencing error. Just looking at the alignment differences, which is a combination of sequencing error and the actual true variation that we see, in these cancer cell lines.
And it's consistent with it being read. Q20 is a little bit higher. The first application that we we examined for RNA seq and using SBX data, I was looking at gene expression and looking at isoform, identification. Here's just a quick bird's eye view of a region of the genome, where we have, we have the SBX-SLR reads.
They're aligned to the genome where the reference isoform structure is below and the data are strand specific. Okay. So the reads that are colored pink are on the top strand. And the reads that are color blue are in the bottom strand. Now the data itself isn't being read as a strand specific. We don't have a strand specific library prep but bioinformatic.
We can actually convert it to strand specific, based on looking at the adapter sequences in the reads that we're sequencing. So we notice, cDNA adapter that corresponds to the bottom strand. Then we can we can recognize that would reverse complement and make it, top strand, read. That's like as a search for strand specificity. And that's actually very effective is over 99% effective.
In this work. Now that we have the reads aligned to the genome, we can, assign reads to genes and we get gene expression values. And that's shown here for all the 96 cell lines, we have gene expression values. We clustered the cell lines according to the gene expression correlation. And we see at the very top here, how the cell lines are clustering, mostly according to the different lineages and cancer types.
Which is reassuring. We do have a biological replicates included. So, we have high correlation and gene expression for the biological replicates. And there's another pair of cell lines where one cell line was historically derived from the other. And we find that the expression values there are also a very highly correlated. Now, back to the bird's eye view of the genome.
Looking at the alignments, we can see it there. It looks like there's evidence for introns and exon structures. And when we compare that to the reference annotations, if we zoom in on one of our favorite genes here, which is the housekeeping gene GAPDH, we can see the reads actually do look quite nice. Most of the reads here, shown are full length.
You can match up the reads with the reference isoform structures below, which was about different dozen different isoforms. At this locus, we're looking at almost a million reads that are aligned at this locus. So I'm really only showing you the the top, like 50 or so reads, out of that million. Right. So we're really just looking at the very top of this, coverage landscape.
Now, just for comparison here, where we have, Illumina data from a publicly available data set, and you see the reads are a lot shorter. One of the key challenges here, we're doing isoform specific analysis, is be able to assign unambiguously individual reads to individual isoforms. And that can be really challenging when the reads are really short.
Because you have a lot of read mapping uncertainty there. But in this case, we have plenty of reads that are in this case, full length and beginning unambiguously mapped to, isoform from which it's derived. And it's one example in the, on this one cell line that I'm showing you, there's actually, a handful of different reference isoforms for which we will find a full length reads for, the one at the very top here is the most highly expressed.
It counts for more than 98% of the expression of this GAPDH in the cell line, and we find over 63,000 full length reads. Now for the others, we find we find, fewer reads. The next plus one, has about 3000 full length reads. And then the one at the very bottom here, we actually only find one full length read, and that's out of a total of 742 million reads.
The way I've read the sample. So it's, it's quite rare. To get some more insight into how we can use these reads for resolving isoform structures. We assigned each read a structural category based on comparing it to the reference annotations. And also here, just shows the different categories of SQANTI based categories where the reference isoform on the top, we have the category of full splice match, which means that the read aligns.
It has the full splicing pattern that matches perfectly up with the reference transcripts. We can have partial matches, so not all the supporting patterns are there, but they agree. Still conflicts. And then we have novel categories with novel in catalog and novel not in catalog. Novel in catalog's just using the existing splice sites, but in different combinations to give you novel products and novel not in catalog's providing, new novel splicing.
One of the first things we looked at, oh, I have another category for you is a useful one, isoform identifiable. Okay. So these are basically they're partial reads, right? But they're long enough for long enough to be able to unambiguously assign it to a specific isoform. One of the first things we looked at was, saturation of unambiguous and unambiguously identifiable isoforms.
And this is, a function of sequencing depth. And we can see that around, sorry, 50 million reads. We reach around 95% saturation of identifiable isoforms. But you may have diminishing returns going out to 100 million reads. Now, the saturation is going to be, a result of not just the sequencing depth, but it's also gonna be impacted by the length of the transcript and how highly expressed it is.
So if transcript is on the shorter side and it's highly expressed, you have a better chance of getting a full splice match. But if you're a longer and you're less expressed, you want to get a full splice match, but you might get isoform identifiable reads. Right. So sort of like the next best thing. We looked for novel isoforms and we were actually able to discover, novel isoforms.
We for each of these 96 cell lines, we assembled the transcripts de novo, using a tool that we developed called FLAIR. You know, we compared those isoform structures, to GENCODE in order to get these same kind of classifications. What I'm showing here is a plot where we have the number of, of cell lines that had evidence of that isoform being independently reconstructed.
Altogether, we have way over 100,000 different isoform structures that were able to assemble, both a few of them are considered core isoforms. We have 2700 approximately. There are known isoforms that were found in all 96 of the cell lines. We only find, less than 100 each of these novel types, in all 96 cell lines.
I just want to show you a couple, just give you an examples of the kinds of novel isoforms that we're finding from the SBX data. So this first example is, is for a TMEM14A, and it's one cell line, which is one of my favorite cell lines we find is three different isoforms that are reconstruct at the top.
The red regions are correspond to the coding regions of these structures. And the bottom you can see we've got, plenty of, full length, SBX reads and, the, the known isoforms. We have a core. No, no, no, it's formed on top. Represents, over 90% of the expression for this gene at this locus in the cell line.
We do have a novel, one that we found, that is also a core 96. And, and it's expressed a 5.8%. You know, notice the differences here at the five prime end and the differences that we have, we actually have two different novel isoforms one's core and one's not core. But the different the five prime end and, and these differences actually impact the coding regions.
So we actually have different N-terminal and these protein sequences. Another example is this YRDC gene. It's a very similar story here. This gene is actually on the opposite strand. So your five prime is now going to be over on this side. And your three prime is going to go over on this side. But you see we have with other three isoforms and they're different here in the five prime end.
We have many reads that appear to be full length SBX reads to support them. And, the big difference here is that the the dominant form in this case, actually, it turns out to be one of the novel core isoforms. But the the known core form is only found at around 18% versus the 74%. So those are kind of interesting.
Now, we don't, you know, it's nice to have full length reads. We love our full length reads. But you don't necessarily need to have full length reads in order to do isoform profiling, right? But if we can have if we have isoform identifiable reads, that's, you know, as long enough. And here are some cases where you have really long chains or so these these transcripts are around 5.7 KB.
So this way no chance we're going to find a full length SBX read for this. For this. But we do find isoform identifiable reads, for these isoforms. And I got a couple examples. Both these examples are really relevant to cancer biology. And this one example corresponds to integrin ITGA6. There's two isoforms.
One of those forms is considered an oncogenic isoform promotes tumor growth. And and you can see that it has it has an exon here that is actually skipped. And the other isoform you can make that out in yellow. And when it includes this isoform and basically adding this exon incorporates a stop codon which actually truncates the C-terminus.
And if we if we look at the different cell lines and which isoforms are expressed, we find an interesting pattern of different isoforms having dominant expression, across different cell lines. In this case, most of the cell lines that are expressing this are actually expressing the oncogenic version. Another example has to do with CD44 for which is a transmembrane protein.
And in this case, you have this yellow region. There's a series of cassette exons, that can be differentially spliced and depending upon what, exons are selected, the impact of this, stem region of the extracellular structure of this, of this protein. And now there's three different forms. The top isoform here actually skips over a yellow region, producing this proto form.
And we actually have full length reads for this one. The others we don't we have partial reads. But again, that's good enough to do our expression profiling. We can find, cell lines where, you know, one is really dominant and the other ones are dominant in other cell lines. Just kind of interesting. Another application that we're very interested in is this fusion transcript detection.
And that's really highly relevant to cancer. We know that there's, Yeah, we can infer structural rearrangements based on that. There are fusion transcripts in cancer that are drivers cancer. And, and there's very good examples of, of this. Usually it's chromosome rearrangements that happened in tumors that, that generate these. One of the best examples here is involves a translocation to the bottom arm of chromosome 22 and chromosome nine.
Which generates these chimeric chromosomes puts the BCR-ABL1 fused with the ABL1 gene. And this actually causes 95% of, cancer in chronic of chronic myelogenous leukemia. So this is the hallmark fusion. It's the best well known fusion that we have for cancer. You can detect it through whole genome sequencing, but it's much easier to detect it through transcriptome sequencing.
It's cheaper. And you actually get at the functional products that are, that are being generated. There are a lot of these, fusion transcripts are relevant to cancer of the COSMIC database has over 300 of them. And some of them are hallmarks of disease. Others you find at lower frequencies. In some cases, they're treatable. So if you have a kinase fusion, you can actually treat it with kinase inhibitors.
And, it can be an improve the patient outcome. So it's important to identify these. We developed the tool and published it earlier this year called CTAT-LR-fusion. It uses long reads to identify fusion transcripts. Now, there's a couple different phases. In the first phase. We identify fusion candidates by finding chimeric reads or part of the read aligns, the one gene part of the lines for a different gene to be on a different chromosome.
Once we have those fusion gene candidates, then we have a model to remodel them as fusion contigs, or essentially take the stretches of the DNA sequence corresponding to the genes, make it into a contig, and put genes in the right order, and then we can realign the reads to it in order to get our breakpoint information and expression information.
So we took this. We adapted it to work with the SBX reads didn't take too much work, but this challenge is working with like a billion reads instead of working with like 10 million reads what we're usually working with. And we we screened the cancer cell lines are working with for these COSMIC fusions. We have a list of known COSMIC fusions that were made available by the DepMap group.
They're based on Illumina data. So shown here on the left. These are the ones that we actually were able to find using CTAT-LR-fusion with, the SBX, reads. We found most of them. And, the expression levels compared between SBX and and Illumina for these fusions were, significantly correlated. But there were some key differences here, that we wanted to examine further.
And there were a few that we didn't find. And so it's not a fault of SBX in any way. But because of the our choice to initially sequence from just the ends of the cDNA. So really limited in terms of what kind of coverage we're getting from the transcripts that we're sequencing. Just highlight one of the examples, that we have from from the COSMIC fusion search and these cell lines.
It's the one that I showed earlier as an example. The BCR-ABL1 fusion. We found this in the MEG-01 cell line. And here we have the fusion contig with the BCR gene and the ABL1 gene put together. And here we have the SBX reads and I'm pointing out where the the breakpoints are supported by the SBX reads.
Not only do we find the BCR-ABL1 fusion, we actually find the reciprocal fusion from the reciprocal translocation, where we have the ABL1 gene and the BCR gene. You can see the SBX evidence for that. We've got reads here with a breakpoint here. In the splicing into the BCR gene over here. So actually both when you have both, you can basically wire them on top of each other and figure out where on the genome might have the translocation actually have occurred.
And here we can sort of narrow it down to the regions, between where the splicing breakpoints were. So, in addition to the COSMIC, fusions, there is a few cell lines have been included that have been very well studied for fusions. And, and we actually surveyed, over 20 different tools for Illumina, Illumina RNA seq.
We're finding these fusions, they're validated fusions. And these are the few cell lines that we find them. And, we targeted SBX for that. And here we had the SBX findings versus the 20 different or so, different, other programs. Based on Illumina data. And the good news is that we're finding most of them. There was one that was particularly concerning here, having to do with, this LAMP1-MCF2L, you can see SBX, we actually don't find it, but there's a bunch of cases where we actually did find it with Illumina data, and there's a bunch of different programs that were able to find it.
So and we know it's real, we should be able to find it. And, but when we dug into it a little bit more, we could see that, here's our fusion contig LAMP1--MCF2L. Oh, here's our Illumina data. Here's our Illumina evidence for that fusion. So we can see where the breakpoint is.
And then we look at the SBX data. We can see there's basically there's not much coverage here. And this is because it's a really long gene. And so MCF2L is a long gene. And you can see we have coverage. But the coverage is really restricted to the very ends. So we want to just address this. And all this work was done over the course of like 4 or 5 weeks.
Two weeks ago we said let's try to fragment the the cDNAs and see if we just run the fragments of cDNA through, you know, see if we'll do any better. So it's took this is the original coverage plot I showed you earlier where we had the high bias on the ends, and, we did the fragmentation.
We basically cleaned up a lot of the end bias that we're having. It's not entirely gone at this point. But again, this is one experiment and there's still a lot of optimization to do here. But lo and behold, we actually now find, the fusion evidence with the SBX with this fusion, and we get better coverage, within this region.
So it turned out to be fruitful. Overall, you know, looking back at this, this checks all the boxes. This was a preliminary analysis was a ton more work that we can do. But again, this was done over the course of just like the last few weeks. Does it work? We say yes. We think this is going to be a very, very powerful platform for for future reference work.
There are some key notable advantages, like the, you know, the read length obviously one of them, we get a lot of medium length reads. I call them longish. We get up, the longish reads, we start getting up to the two KB mark. And we can quickly yield these, you know, billions of reads and a very, very short period of time, which is pretty exciting.
We do have a wishlist. There are things that really would benefit future transcriptomics applications with this. One of the obvious ones is, is getting a broader a transcript coverage distribution. Well, what issues now is that when you do this fragmentation, you actually lose that strand specificity that we had earlier, for the internal fragments.
So, but that's this is all stuff is solvable in that, you know, nothing to worry about. There. More reads. We want more reads that are in like the one KB and sort of like the longer read territory pushing the limits there, which also presumably is going to be doable. And, and then improving the base calling accuracy for simplex reads.
Where is this? Because we'd like to have high accuracy reads. But unit the Q20 plus, it wasn't holding us back in any way. So overall, very excited about this. But acknowledge the broad clinical labs, methods development group that I worked in. Headed by Issac Kohane, the DepMap group and the cancer data science group for collaborating with us on the cell lines and, of course, our partners at Roche, for really giving us this early access to the sequence data.
Exciting stuff.
You might be interested in
The AXELIOS 1 sequencing platform and sequencing by expansion (SBX) technology are in development and not commercially available. The content of this material reflects current research study results and/or design goals. The AXELIOS 1 sequencing platform based on SBX technology will be launched for Research Use Only. Not for use in diagnostic procedures. AXELIOS is a trademark of Roche.