transposonmapper.properties

transposonmapper.properties

transposonmapper.properties.chromosome_position(gff_file)[source]

Get the start and end position of each chromosome and determine their respective length. Input is a .gff file downloaded from https://www.ensembl.org/Saccharomyces_cerevisiae/Info/Index Output are three dictionaries for length, start and end position. All dictionaries have keys representing the chromosome number in roman numerals. To get all dictionaries, use: ‘a,b,c, chromosome_and_gene_position.chromosome_position()’. ‘a’ = chromosome length ‘b’ = chromosome start position ‘c’ = chromosome end position

Parameters

gff_file (str) – The file path of a .gff file downloaded from https://www.ensembl.org/Saccharomyces_cerevisiae/Info/Index

Returns

  • dict – A dictionary relating each chromosome with its length

  • dict – A dictionary relating each chromosome with its start position

  • dict – A dictionary relating each chromosome with its end position

transposonmapper.properties.gene_aliases(gene_information_file=None)[source]

Create three dictionaries containing aliases for genes Input is the path to ‘Protein_Names.txt’ file downloaded from https://www.uniprot.org/docs/yeast. If no input is given the file is automatically searched for at thisscriptlocation/../Data_Files/Yeast_Protein_Names.txt. Output is three dictionaries: aliases_designation_dict = gene aliases for common names (e.g. Bem1 and Sro1) aliases_sgd_dict = gene aliases for the search names in SGD (e.g. Bem1 and S000000404) aliases_swissprot_dict = gene aliases for Swiss Prot. The keys of the dictionaries are the systematic names of the genes (e.g. YBR200W for Bem1)

Search through lists to get corresponding key:

[key for key, val in aliases.items() if ‘TFC3’ in val]

Parameters

gene_information_file (str, optional) – Input is the path to ‘Protein_Names.txt’ file downloaded from https://www.uniprot.org/docs/yeast. If no input is given the file is automatically searched for at Yeast_Protein_Names.txt inside the package, by default None

Returns

  • dict – gene aliases for common names (e.g. Bem1 and Sro1)

  • dict – gene aliases for the search names in SGD (e.g. Bem1 and S000000404)

  • dict – gene aliases for Swiss Prot

transposonmapper.properties.gene_position(gff_file, get_dict=True)[source]

Get the start and end position of each gene and determine their respective length. Input is a .gff file downloaded from https://www.ensembl.org/Saccharomyces_cerevisiae/Info/Index Output is a dictionary that includes all gene names as keys. The values are lists with four inputs. The first is the chromosome number the gene belong to, the second is the start position, the third is the end position of the gene in terms of basepairs, the fourth is the reading orientation of the gene. The reading orientation is indicated with a ‘+’ (forward reading) or ‘-‘ (reverse reading). The get_dict by default sets that the output should be given as a dictionary with keys the different genes and the values a list of the different parameters. When the get_dict is set to False, the code returns all the values as individual lists.

Parameters
Returns

Output is a dictionary that includes all gene names as keys. The values are lists with four inputs. The first is the chromosome number the gene belong to, the second is the start position, the third is the end position of the gene in terms of basepairs, the fourth is the reading orientation of the gene. The reading orientation is indicated with a ‘+’ (forward reading) or ‘-‘ (reverse reading).

Return type

dict

transposonmapper.properties.get_chromosome_names(bam)[source]

This functions translate the format of the chromosome names from the alignment file from the pysam module into numbers

Parameters

bam (dict , It is the output of the function pysam.AlignmentFile(bamfile, "rb")) –

Returns

ref_tid

Return type

dict, where the values are the chromosome numbers per key in the bam file.

transposonmapper.properties.get_chromosome_reads(bam)[source]

This function returns statistics about mapped/unmapped reads per chromosome as they are stored in the index. It makes use of the method get_index_statistics() from the pysam module. :param bam: :type bam: dict, It is the output of the function pysam.AlignmentFile(bamfile, “rb”)

Returns

mapped_reads – Syntax ‘I’ | [mapped, unmapped, total reads]

Return type

dict[str, list]

transposonmapper.properties.get_coordinates_genes(path: str = '', data_files: dict = {})[source]

Get coordinates of all genes

Parameters
  • path (str) –

  • data_files (dict) –

Returns

  • essential_coordinates (dict)

  • aliases_designation (dict)

transposonmapper.properties.get_sequence_length(bam)[source]

This function returns the length of each chromosome and a cumulative sum of them.

Parameters

bam (dict, It is the output of the function pysam.AlignmentFile(bamfile, "rb")) –

Returns

  • chr_lengths (dict, A dictionary which very value is the length in basepairs)

  • of each chromosome in the alignment file.

  • chr_lengths_cumsum (dict , A dictionary which very value is the sum of the length in basepairs)

  • of the previous chromosomes in the alignment file. chr_lengths_cumsum{n}=cumulative sum (n-1 previous chromosomes

  • length)

transposonmapper.properties.list_gene_names(gene_information_file=None)[source]

Create a list of all known gene names and their aliases as listed on SGD (or as provided as an optional input file) Input is a standard file downloaded from https://www.uniprot.org/docs/yeast. Output is list of all genes, which also includes all the aliases (if they exists).

Parameters

gene_information_file (str, optional) – Input is a standard file downloaded from https://www.uniprot.org/docs/yeast, by default None

Returns

Output is list of all genes, which also includes all the aliases (if they exists).

Return type

list