| seqkit sample -p 0.1 \ | seqkit head -n 1000 -o sample.fa.gz # 设置随机种子,方便重复结果: -s 11 zcat hairpin.fa.gz \ | seqkit sample -p 0.1 -s 11 |head # 抽样后打乱序列 :seqkit shuffle zcat hairpin.fa.gz \ | seqkit sample -p 0.1 \ ...
| seqkit sample -p 0.1 \ | seqkit head -n 1000 -o sample.fa.gz # 设置随机种子,方便重复结果: -s 11 zcat hairpin.fa.gz \ | seqkit sample -p 0.1 -s 11 |head # 抽样后打乱序列 :seqkit shuffle zcat hairpin.fa.gz \ | seqkit sample -p 0.1 \ | seqkit shuffle -o sample.fa.gz 1...
seqkit sample[flags]参数:-n,--numberintsample by number(result may not exactly match)-p,--proportionfloatsample by proportion(按比例提)-s,--rand-seedintrand seedforshuffle(default11)-2,--two-pass2-pass modelower memory 举例:随机抽取序列 seqkit sample -n 10000 -s 11 test1_1.fq -o sam...
sample 按数量或比例对序列进行抽样 sana 清理损坏的单行fastq文件 scat real time recursive concatenation and streaming of fastx files seq 转换序列(反向,补充,提取ID…) shuffle 随机序列 sliding 序列滑窗提取,支持环形基因组 sort 按id/名称/序列/长度排序序列 ...
10.sample zcat hairpin.fa.gz | seqkit sample -p 0.1 -o sample.fa.gz #按照比例取序列 zcat hairpin.fa.gz | seqkit sample -n 1000 -o sample.fa.gz #按照数量 11.rename cat in.fa | less #和seqtk中rename的区别是前者会从1到n重新排序,后者是对后来重复的内容加_2到_n的后缀 ...
sample:按数量或比例取样 sana:清理不完整的单行fq文件 scat:对fastx进行连接 seq:可用于选择、滤除或随机提取从FASTA或FASTQ文件中的序列 shuffle:随机序列 sliding:在滑动窗口中提取子序列 sort:按id/name/sequence/length排序 split:按id/seq region/size/parts将序列拆分为文件(主要用于FASTA) ...
| seqkit sample -p 0.1 \ | seqkit head -n 1000 -o sample.fa.gz # 设置随机种子,方便重复结果: -s 11 zcat hairpin.fa.gz \ | seqkit sample -p 0.1 -s 11 |head # 抽样后打乱序列 :seqkit shuffle zcat hairpin.fa.gz \ | seqkit sample -p 0.1 \ ...
sample by number (result may not exactly match) -p, --proportion float sample by proportion(按比例提) -s, --rand-seed int rand seed for shuffle (default 11) -2, --two-pass 2-pass modelower memory 举例:随机抽取序列 seqkit sample -n 10000 -s 11 test1_1.fq -o sample.fq ...
从上图理解,格式是seqtk sample in.fa 分数|具体数字,也就是说我们可以抽取多少含量,也可以抽具体数字. seqtk sample atha.fasta 2 如果我们不改变随机种子,那么每次随机都会是一样的结果. -s 改变随机种子 2.3 subseq 用此指令提取序列. ...
sample sample sequences by number or proportion seq transform sequences (revserse, complement, extract ID...) shuffle shuffle sequences sliding sliding sequences, circular genome supported sort sort sequences by id/name/sequence/length split split sequences into files by id/seq region/size/parts...