Google+ Followers

Friday, May 21, 2010

Generation of Random amino acid protein sequences using Python/Bio-python

This program is about how to generate protein sequences (random sequences of 100amino acid) and these sequences should be stored in a database type file. The sequence analysis gave the bioscience researches a new direction. This project works for generating new sequences of proteins. These sequences might already exist in nature and having similarity with any organism. Through this software new random sequences may be generated and saved to a file on user’s machine. Saving to file helps to compare different sequences as well as the complete information may be placed at the same place.
The Protein generation part is a simple program which takes in user input for number of random Protein Sequences to generate and filename/file-path. Based on these user inputs random Protein sequences are generate of 100 amino acid residues of type IUPACProtein. The sequences thus generated are saved to a file in the current directory by the name username.fasta in Fasta format.
This program is in Python Language using BIO-Python modules. The central object in bio-informatics is the sequence, thus it started with the bio-python mechanisms for dealing with sequences.
# File Name RandonProteinSequences.py
# standard library
import os
import random

# biopython
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
from Bio.SeqRecord import SeqRecord
import Bio.writers.SeqRecord.fasta
from Bio import SeqIO
from sys import *

residueList1 = ["C","D","E","F","G","H","I"]
residueList2 = ["A","K","L","M","N","S"]
residueList3 = ["P","Q","R","T","V","W","Y"]
residueList4 = ["C","A","G","U"]
def getProteinSeqRecord(residue, seqcount):
strSeq = ""
for i in range(0,100,1):
index = random.randint(0, len(residue)-1)
strSeq += residue[index]

sequence = Seq(strSeq, IUPAC.IUPACProtein)
seqRec = SeqRecord(sequence, id = 'randSeq' + str(seqcount), description= 'A random sequence using Amino acid residues.')
return seqRec

def getProteinSequence(residue):
strSeq = ""
for i in range(0,100,1):
index = random.randint(0, len(residue)-1)
strSeq += residue[index]

sequence = Seq(strSeq, IUPAC.IUPACProtein)
return sequence

def randomProteinSeqRecord(index):
if(index%2)==0:
return getProteinSeqRecord(residueList1, index)
elif(index%3)==0:
return getProteinSeqRecord(residueList2, index)
else:
return getProteinSeqRecord(residueList3, index)

#information
print '--- This is python based program to generate random sequences ---'
print '--- Provide number of random sequences to generate. Default 10 ---'
print '--- Inorder to save to a file provide file path or filename ---'
print '--- If none or invalid filepath is provided then results will be displayed to console ---'
print '--- The file will be created in fasta format ---'
print

filepathProvided = False
#raw_input received the user input as string
try:
filepath = raw_input('Enter filepath to save sequences ... ')
filepath = filepath + '.fasta'
handle = open(filepath, "w")
handle.close()

filepathProvided = True
except IOError:
print 'Invalid or No File provided will print results to console'
print
ranSeqCount = 10
try:
ranSeqCount = int(raw_input('Enter number of random sequences to generate ... '))
except ValueError:
ranSeqCount = 10
pass

if(filepathProvided):
handle = open(filepath, "w")

if(filepathProvided):
fasta_writer = Bio.writers.SeqRecord.fasta.WriteFasta(handle)
else:
fasta_writer = Bio.writers.SeqRecord.fasta.WriteFasta(stdout)
print 'Sequence Count : '
print ranSeqCount

for i in range(0,ranSeqCount,1):
fasta_writer.write(randomProteinSeqRecord(i+1))
if(filepathProvided):
handle.close()
print 'File created at : ' + filepath

print
raw_input('Press any key to exit ...')
print

This software will also help user to create protein sequences of fairly distributed amino acids of his own choice. It means the user can create a new Protein sequence database type files which will help in studies and researches of varieties of species and organisms.
These phenomena prove the relationship and dependency of protein and genes on each other to operate an organism.

1 comment:

  1. Thanks for sharing information about amino acids bodybuilding. Your blog is very appreciable and informational. Healthgenie.in offers at amino acids bodybuilding, weighing scales, best protein powder products with heavy discount.

    ReplyDelete