I had to do some Googling to find the appropriate bash commands, to process the fasta files from the terminal command line.ġ) cat - concatenated all the files into 1 large fileĢ) I want to sort by line size, but if you do it with a normal fasta file, the header is read as a different line than the sequence body (so it'll delete all the sequences under 250 INCLUDING the fasta headers, which have all the information I need!) Therefore, I had to remove all newlines \n and put in a * so the header and sequence all are read as one long line.ģ) Next I used awk to sort based on size and I removed all sequences that are less than 250 base pairs.Ĥ) Put the fasta format back the way it was before I sorted based on size. One challenge I face is moving the data around and formatting it properly. All 3 of these genomes are Illumina Next-Gen. I am doing a 3 way comparison of repeat elements in 3 species of anole lizards.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |