transposonmapper.properties
transposonmapper.properties¶
- transposonmapper.properties.chromosome_position(gff_file)[source]¶
Get the start and end position of each chromosome and determine their respective length. Input is a .gff file downloaded from https://www.ensembl.org/Saccharomyces_cerevisiae/Info/Index Output are three dictionaries for length, start and end position. All dictionaries have keys representing the chromosome number in roman numerals. To get all dictionaries, use: ‘a,b,c, chromosome_and_gene_position.chromosome_position()’. ‘a’ = chromosome length ‘b’ = chromosome start position ‘c’ = chromosome end position
- Parameters
gff_file (str) – The file path of a .gff file downloaded from https://www.ensembl.org/Saccharomyces_cerevisiae/Info/Index
- Returns
dict – A dictionary relating each chromosome with its length
dict – A dictionary relating each chromosome with its start position
dict – A dictionary relating each chromosome with its end position
- transposonmapper.properties.gene_aliases(gene_information_file=None)[source]¶
Create three dictionaries containing aliases for genes Input is the path to ‘Protein_Names.txt’ file downloaded from https://www.uniprot.org/docs/yeast. If no input is given the file is automatically searched for at thisscriptlocation/../Data_Files/Yeast_Protein_Names.txt. Output is three dictionaries: aliases_designation_dict = gene aliases for common names (e.g. Bem1 and Sro1) aliases_sgd_dict = gene aliases for the search names in SGD (e.g. Bem1 and S000000404) aliases_swissprot_dict = gene aliases for Swiss Prot. The keys of the dictionaries are the systematic names of the genes (e.g. YBR200W for Bem1)
- Search through lists to get corresponding key:
[key for key, val in aliases.items() if ‘TFC3’ in val]
- Parameters
gene_information_file (str, optional) – Input is the path to ‘Protein_Names.txt’ file downloaded from https://www.uniprot.org/docs/yeast. If no input is given the file is automatically searched for at Yeast_Protein_Names.txt inside the package, by default None
- Returns
dict – gene aliases for common names (e.g. Bem1 and Sro1)
dict – gene aliases for the search names in SGD (e.g. Bem1 and S000000404)
dict – gene aliases for Swiss Prot
- transposonmapper.properties.gene_position(gff_file, get_dict=True)[source]¶
Get the start and end position of each gene and determine their respective length. Input is a .gff file downloaded from https://www.ensembl.org/Saccharomyces_cerevisiae/Info/Index Output is a dictionary that includes all gene names as keys. The values are lists with four inputs. The first is the chromosome number the gene belong to, the second is the start position, the third is the end position of the gene in terms of basepairs, the fourth is the reading orientation of the gene. The reading orientation is indicated with a ‘+’ (forward reading) or ‘-‘ (reverse reading). The get_dict by default sets that the output should be given as a dictionary with keys the different genes and the values a list of the different parameters. When the get_dict is set to False, the code returns all the values as individual lists.
- Parameters
gff_file (str) – The file path of a .gff file downloaded from https://www.ensembl.org/Saccharomyces_cerevisiae/Info/Index
get_dict (bool, optional) – When the get_dict is set to False, the code returns all the values as individual lists, by default True
- Returns
Output is a dictionary that includes all gene names as keys. The values are lists with four inputs. The first is the chromosome number the gene belong to, the second is the start position, the third is the end position of the gene in terms of basepairs, the fourth is the reading orientation of the gene. The reading orientation is indicated with a ‘+’ (forward reading) or ‘-‘ (reverse reading).
- Return type
dict
- transposonmapper.properties.get_chromosome_names(bam)[source]¶
This functions translate the format of the chromosome names from the alignment file from the pysam module into numbers
- Parameters
bam (dict , It is the output of the function pysam.AlignmentFile(bamfile, "rb")) –
- Returns
ref_tid
- Return type
dict, where the values are the chromosome numbers per key in the bam file.
- transposonmapper.properties.get_chromosome_reads(bam)[source]¶
This function returns statistics about mapped/unmapped reads per chromosome as they are stored in the index. It makes use of the method get_index_statistics() from the pysam module. :param bam: :type bam: dict, It is the output of the function pysam.AlignmentFile(bamfile, “rb”)
- Returns
mapped_reads – Syntax ‘I’ | [mapped, unmapped, total reads]
- Return type
dict[str, list]
- transposonmapper.properties.get_coordinates_genes(path: str = '', data_files: dict = {})[source]¶
Get coordinates of all genes
- Parameters
path (str) –
data_files (dict) –
- Returns
essential_coordinates (dict)
aliases_designation (dict)
- transposonmapper.properties.get_sequence_length(bam)[source]¶
This function returns the length of each chromosome and a cumulative sum of them.
- Parameters
bam (dict, It is the output of the function pysam.AlignmentFile(bamfile, "rb")) –
- Returns
chr_lengths (dict, A dictionary which very value is the length in basepairs)
of each chromosome in the alignment file.
chr_lengths_cumsum (dict , A dictionary which very value is the sum of the length in basepairs)
of the previous chromosomes in the alignment file. chr_lengths_cumsum{n}=cumulative sum (n-1 previous chromosomes
length)
- transposonmapper.properties.list_gene_names(gene_information_file=None)[source]¶
Create a list of all known gene names and their aliases as listed on SGD (or as provided as an optional input file) Input is a standard file downloaded from https://www.uniprot.org/docs/yeast. Output is list of all genes, which also includes all the aliases (if they exists).
- Parameters
gene_information_file (str, optional) – Input is a standard file downloaded from https://www.uniprot.org/docs/yeast, by default None
- Returns
Output is list of all genes, which also includes all the aliases (if they exists).
- Return type
list