Query & integrate data#

import lamindb as ln
import bionty as bt

💡 connected lamindb: testuser1/test-facs

ln.settings.transform.stem_uid = "wukchS8V976U"
ln.settings.transform.version = "0"
ln.track()

💡 notebook imports: bionty==0.42.7 lamindb==0.69.9

💡 saved: Transform(uid='wukchS8V976U6K79', name='Query & integrate data', key='facs3', version='0', type='notebook', updated_at=2024-04-10 17:54:26 UTC, created_by_id=1)

💡 saved: Run(uid='CWvZ30bs7lqwqfnMpwwz', transform_id=3, created_by_id=1)

Inspect the CellMarker registry #

Inspect your aggregated cell marker registry as a DataFrame:

bt.CellMarker.df().head()

	uid	name	synonyms	gene_symbol	ncbi_gene_id	uniprotkb_id	organism_id	public_source_id	created_at	updated_at	created_by_id
id
41	7SyRazPQeCqG	CD14/19	None	None	None	None	1	NaN	2024-04-10 17:54:20.526397+00:00	2024-04-10 17:54:20.526418+00:00	1
40	6ASIQ7GR2c39	CD103		ITGAE	3682	P38570	1	18.0	2024-04-10 17:54:20.491143+00:00	2024-04-10 17:54:20.491153+00:00	1
39	7OES2NXy0W6C	CD69		CD69	969	Q07108	1	18.0	2024-04-10 17:54:20.491047+00:00	2024-04-10 17:54:20.491058+00:00	1
38	4Y0JkNLWc8tl	CD49B		ITGA2	3673	P17301	1	18.0	2024-04-10 17:54:20.490949+00:00	2024-04-10 17:54:20.490960+00:00	1
37	2ddvD3rZZ38f	CXCR4		CXCR4	7852	P61073	1	18.0	2024-04-10 17:54:20.490849+00:00	2024-04-10 17:54:20.490861+00:00	1

Search for a marker (synonyms aware):

bt.CellMarker.search("PD-1").head(2)

	uid	synonyms	score
name
PD1	6c7MomnrsfYu	PID1\|PD-1\|PD 1	100.0
CD14/19	7SyRazPQeCqG		54.5

Look up markers with auto-complete:

markers = bt.CellMarker.lookup()

markers.cd8

Private registry
Entity: CellMarker
📖 .df(): reference table
🔎 .lookup(): autocompletion of terms
🎯 .search(): free text search of terms
✅ .validate(): strictly validate values
🧐 .inspect(): full inspection of values
👽 .standardize(): convert to standardized names

Query artifacts by markers #

Query panels and collections based on markers, e.g., which collections have 'CD8' in the flow panel:

panels_with_cd8 = ln.FeatureSet.filter(cell_markers=markers.cd8).all()

ln.Artifact.filter(feature_sets__in=panels_with_cd8).df()

	uid	storage_id	key	suffix	accessor	description	version	size	hash	hash_type	n_objects	n_observations	transform_id	run_id	visibility	key_is_virtual	created_at	updated_at	created_by_id
id
1	F7Q3UeB48eoUVjZFTSu1	1	None	.h5ad	AnnData	Alpert19	None	33369696	VsTnnzHN63ovNESaJtlRUQ	md5	None	None	1	1	1	True	2024-04-10 17:54:11.193142+00:00	2024-04-10 17:54:11.317701+00:00	1
2	YwJCInfD0s5prm62GDgj	1	None	.h5ad	AnnData	Oetjen18_t1	None	46501304	I8nRS02iBs5z1J01b2qwOg	md5	None	None	2	2	1	True	2024-04-10 17:54:20.939413+00:00	2024-04-10 17:54:21.018519+00:00	1

Access registries:

features = ln.Feature.lookup()

Find shared cell markers between two files:

artifacts = ln.Artifact.filter(feature_sets__in=panels_with_cd8).list()
file1, file2 = artifacts[0], artifacts[1]

shared_markers = file1.features["var"] & file2.features["var"]
shared_markers.list("name")

['Cd4', 'CD8', 'CD3', 'CD27', 'Ccr7', 'CD45RA']