Source code for transposonmapper.properties.get_gene_position

[docs]def gene_position(gff_file, get_dict=True):
    """Get the start and end position of each gene and determine their respective length.
    Input is a .gff file downloaded from https://www.ensembl.org/Saccharomyces_cerevisiae/Info/Index
    Output is a dictionary that includes all gene names as keys. The values are lists with four inputs.
    The first is the chromosome number the gene belong to, the second is the start position, the third is the end position of the gene in terms of basepairs, the fourth is the reading orientation of the gene.
    The reading orientation is indicated with a '+' (forward reading) or '-' (reverse reading).
    The get_dict by default sets that the output should be given as a dictionary with keys the different genes and the values a list of the different parameters.
    When the get_dict is set to False, the code returns all the values as individual lists.

    Parameters
    ----------
    gff_file : str
        The file path of  a .gff file downloaded from https://www.ensembl.org/Saccharomyces_cerevisiae/Info/Index
    get_dict : bool, optional
        When the get_dict is set to False, the code returns all the values as individual lists, by default True

    Returns
    -------
    dict
        Output is a dictionary that includes all gene names as keys. The values are lists with four inputs.
        The first is the chromosome number the gene belong to, the second is the start position, the third is the end position of the gene in terms of basepairs, the fourth is the reading orientation of the gene.
        The reading orientation is indicated with a '+' (forward reading) or '-' (reverse reading).
    """    

    if get_dict == True:
        gene_pos_dict = {}
        with open(gff_file) as f:
            for line in f:
                line_list = line.split("\t")
                if len(line_list) > 2:
                    if line_list[2] == "gene":
                        gene_chr = line_list[0]
                        gene_start = line_list[3]
                        gene_end = line_list[4]
                        gene_orien = line_list[6]
                        gene_position = [
                            gene_chr,
                            int(gene_start),
                            int(gene_end),
                            gene_orien,
                        ]

                        gene_name_string = line_list[8].split(";")[0]
                        gene_name = gene_name_string.split(":")[1]

                        gene_pos_dict[gene_name] = gene_position

        return gene_pos_dict

    else:
        gene_chr = []
        gene_start = []
        gene_end = []
        gene_orien = []
        gene_name = []
        with open(gff_file) as f:
            for line in f:
                line_list = line.split("\t")
                if len(line_list) > 2:
                    if line_list[2] == "gene":
                        gene_chr.append(line_list[0])
                        gene_start.append(int(line_list[3]))
                        gene_end.append(int(line_list[4]))
                        gene_orien.append(line_list[6])

                        gene_name_string = line_list[8].split(";")[0]
                        gene_name.append(gene_name_string.split(":")[1])

        return (gene_name, gene_chr, gene_start, gene_end, gene_orien)
SATAY pipeline at Delft :)

Source code for transposonmapper.properties.get_gene_position