还是使用CCDS记录文件吧,CCDS 数据库旨在确定一组核心的人类和小鼠蛋白质编码区域,这些区域具有一致的注释和高质量。人类数据更新到了2018 年 ,包括了 33397 个 CCDS IDs,共 19033 个 Gene 。 在数据库:ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/ 可以下载,然后需要在Linux或者Mac环境下面使用 bedtools 软件加...
GJB6:ENST00000400065.3:wholegene,CRYL1:ENST00000382812.1:wholegene,GJB6:ENST00000356192.6:wholegene,GJB6:ENST00000400066.3:wholegene, 13 20797176 21105944 0 - comments: a 342kb deletion encompassing GJB6, associated with hearing loss
cat CCDS.20180614.txt |perl -alne '{/\[(.*?)\]/;next unless $1;$gene=$F[2];$exons=$1;$exons=~s/\s//g;$exons=~s/-/\t/g;print "$F[0]\t$_\t$gene" foreach split/,/,$exons;}'|sort -u |bedtools sort -i >exon_probe.hg38.gene.bed ...
grep -w Public CCDS.20191024.txt |perl -alne'{/\[(.*?)\]/;next unless $1;$gene=$F[2];$exons=$1;$exons=~s/\s//g;$exons=~s/-/\t/g;print "$F[0]\t$_\t$gene" foreach split/,/,$exons;}'|sort -u |bedtools sort -i >mm10.exon ...
CCDS Release 24 includes a total of 35,608 CCDS IDs that correspond to 19,107 GeneIDs, with 48,062 protein sequences from NCBI and 47,762 from Ensembl. See the Releases & Statistics report for details. 二、讨论 2.2 下载链接 https://ftp.ncbi.nlm.nih.gov/pub/CCDS/current_human/ 2.1...
The Consensus CDS (CCDS) project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long-term goal is to support convergence toward a standard set of gene annotations.consensus;www.ncbi.nlm.nih.go...