publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2024
- Spotiflow: accurate and efficient spot detection for imaging-based spatial transcriptomics with stereographic flow regressionAlbert Dominguez Mantes , Antonio Herrera , Irina Khven , and 7 more authorsbioRxiv, 2024
The identification of spot-like structures in large and noisy microscopy images is an important task in many life science techniques, and it is essential to their quantitative performance. For example, imaging-based spatial transcriptomics (iST) methods rely critically on the accurate detection of millions of transcripts in low signal-to-noise ratio (SNR) images. While recent developments in computer vision have revolutionized many bioimage tasks, currently adopted spot detection approaches for iST still rely on classical signal processing methods that are fragile and require manually tuning. In this work we introduce Spotiflow, a deep-learning method that casts the spot-detection problem as a multiscale stereographic flow regression problem that yields subpixel-accurate localizations. Spotiflow is robust to different noise conditions and generalizes across different chemistries while being up to an order-of-magnitude more time and memory efficient than commonly used methods. We show the efficacy of Spotiflow by comprehensive quantitative comparisons against other methods on a variety of datasets and demonstrate the impact of its increased accuracy on the biological conclusions drawn from iST and live imaging experiments. Spotiflow is available as an easy-to-use Python library as well as a napari plugin at http://www.github.com/weigertlab/spotiflow.
- Statistical inference with a manifold-constrained RNA velocity model uncovers cell cycle speed modulationsAlex R. Lederer , Maxine Leonardi , Lorenzo Talamanca , and 10 more authorsbioRxiv, 2024
Across a range of biological processes, cells undergo coordinated changes in gene expression, resulting in transcriptome dynamics that unfold within a low-dimensional manifold. Single-cell RNA-sequencing (scRNA-seq) only measures temporal snapshots of gene expression. However, information on the underlying low-dimensional dynamics can be extracted using RNA velocity, which models unspliced and spliced RNA abundances to estimate the rate of change of gene expression. Available RNA velocity algorithms can be fragile and rely on heuristics that lack statistical control. Moreover, the estimated vector field is not dynamically consistent with the traversed gene expression manifold. Here, we develop a generative model of RNA velocity and a Bayesian inference approach that solves these problems. Our model couples velocity field and manifold estimation in a reformulated, unified framework, so as to coherently identify the parameters of an autonomous dynamical system. Focusing on the cell cycle, we implemented VeloCycle to study gene regulation dynamics on one-dimensional periodic manifolds and validated using live-imaging its ability to infer actual cell cycle periods. We benchmarked RNA velocity inference with sensitivity analyses and demonstrated one- and multiple-sample testing. We also conducted Markov chain Monte Carlo inference on the model, uncovering key relationships between gene-specific kinetics and our gene-independent velocity estimate. Finally, we applied VeloCycle to in vivo samples and in vitro genome-wide Perturb-seq, revealing regionally-defined proliferation modes in neural progenitors and the effect of gene knockdowns on cell cycle speed. Ultimately, VeloCycle expands the scRNA-seq analysis toolkit with a modular and statistically rigorous RNA velocity inference framework.Competing Interest StatementThe authors have declared no competing interest.
2023
- Neural ADMIXTURE for rapid genomic clusteringAlbert Dominguez Mantes , Daniel Mas Montserrat , Carlos D. Bustamante , and 2 more authorsNature Computational Science, 2023
Characterizing the genetic structure of large cohorts has become increasingly important as genetic studies extend to massive, increasingly diverse biobanks. Popular methods decompose individual genomes into fractional cluster assignments with each cluster representing a vector of DNA variant frequencies. However, with rapidly increasing biobank sizes, these methods have become computationally intractable. Here we present Neural ADMIXTURE, a neural network autoencoder that follows the same modeling assumptions as the current standard algorithm, ADMIXTURE, while reducing the compute time by orders of magnitude surpassing even the fastest alternatives. One month of continuous compute using ADMIXTURE can be reduced to just hours with Neural ADMIXTURE. A multi-head approach allows Neural ADMIXTURE to offer even further acceleration by computing multiple cluster numbers in a single run. Furthermore, the models can be stored, allowing cluster assignment to be performed on new data in linear time without needing to share the training samples.
2022
- Archetypal Analysis for population geneticsJulia Gimbernat-Mayol , Albert Dominguez Mantes , Carlos D. Bustamante , and 2 more authorsPLOS Computational Biology, Aug 2022
The estimation of genetic clusters using genomic data has application from genome-wide association studies (GWAS) to demographic history to polygenic risk scores (PRS) and is expected to play an important role in the analyses of increasingly diverse, large-scale cohorts. However, existing methods are computationally-intensive, prohibitively so in the case of nationwide biobanks. Here we explore Archetypal Analysis as an efficient, unsupervised approach for identifying genetic clusters and for associating individuals with them. Such unsupervised approaches help avoid conflating socially constructed ethnic labels with genetic clusters by eliminating the need for exogenous training labels. We show that Archetypal Analysis yields similar cluster structure to existing unsupervised methods such as ADMIXTURE and provides interpretative advantages. More importantly, we show that since Archetypal Analysis can be used with lower-dimensional representations of genetic data, significant reductions in computational time and memory requirements are possible. When Archetypal Analysis is run in such a fashion, it takes several orders of magnitude less compute time than the current standard, ADMIXTURE. Finally, we demonstrate uses ranging across datasets from humans to canids.