Reference¶

ce_detector.detector
ce_detector.annotator
ce_detector.scanner

ce_detector.detector¶

class for detecting junction reads

class ce_detector.detector.JunctionDetector(bam_file, reference, quality=0, output=None)[source]¶

class for detecting junction reads and record position

Parameters

bam_file (str) – bam file
output (str) – filename of output
reference (str) – filename of genome reference
quality (int) – quality for filtering junction reads

static check_strand(anchor, acceptor)[source]¶

check type of strand

Parameters

anchor (str) – anchor of read
acceptor (str) – acceptor of read

Returns

type of strand (-|+)

Return type

str

run(logger, verbose=False)[source]¶

detect junction reads and annotate slice site, write results to file

Returns: instance from junctionmap
Return type: instance

worker(bam_file, reference, chrom, quality, idn, junctionmap)[source]¶

find junction reads and annotate slice site

Parameters

bam_file (instance) – handle of bam_file
reference (instance) – handle of reference
chrom (str) – chromosome
quality (int) – quality for filtering reads
idn (int) – identifier of reads
junctionmap (instance) – instance from junctionmap

Returns

instance from junctionmap

Return type

instance

class ce_detector.detector.JunctionMap[source]¶

build a class to store information of all junction reads

add_read(read)[source]¶

add read to junctionlist

Parameters: read (instance) – instance from Read

get_read(identifiers)[source]¶

get read from junctionlist according to identifiers

Parameters: identifiers (int) – identifier for every read: chrom_start_end
Returns: instance from Read
Return type: instance

write2file(output, header=None)[source]¶

write all reads in junctionlist to file

Parameters

output (str or TextIo) – file name of output
header (str) – header of output

class ce_detector.detector.Read(chrom, start, end, idn, score, strand, anchor, acceptor)[source]¶

build a read class for storing information of every junction read

Parameters

chrom (str) – chromosome of genome
start (int) – start position of junction read
end (int) – end position of junction read
idn (int) – index of junction read
score (int) – support of junction read
strand (str) – direction of junction read (-|+)
anchor (str) – anchor of junction read
acceptor (str) – acceptor of junction read

ce_detector.annotator¶

class for annotating junction reads

class ce_detector.annotator.Annotator(junctionmap, database: Any, output=None)[source]¶

annotate junction reads

Parameters

junctionmap (instance) – instance return ce_detector.detector.JunctionMap
database (Any) – database of annotation files
output (TestIo) – filename of annotated junction reads. Defaults to None

annotate_junction(read, result, db)[source]¶

annotate junction reads and write results to file

Parameters

read (instance) – junction read return ce_detector.detector.Read
result (defaultdict[Any, Any]) – gene list used for annotation
db (instance of file) – database of annotation file

static detect_property(start, end, junction_list)[source]¶

detect type of slice, number of skipped donors and number of skipped acceptors

type of slice including D A DA N NDA

Parameters

start (int) – start of junction read
end (int) – end of junction read
junction_list (numpy.array) – gene list of junction reads

Returns

type of slice, number of skipped donors, number of skipped acceptors

run(logger, verbose=False)[source]¶

main function used to annotate junction reads

pick all genes covered by one junction read and annotate all of them: type of slice, number of skipped donors and number of skipped acceptors

ce_detector.scanner¶

class for scanning cryptic exons based on previous _result :: Junction detector and junction annotator

class ce_detector.scanner.Scanner(cutoff, output)[source]¶

class for scanning cryptic exons based on annotated junction reads

Parameters

cutoff (int) – cutoff used to filter junction reads with relatively low score or depth
output (str) – filename of _result

run(junctionmap, logger, verbose=False) → Iterable[source]¶

run program to scan cryptic exons

Parameters

verbose –
logger –
junctionmap (instance) – instance from ce_detector.detector.JunctionMap

Returns

temporary result used to store cryptic exons

Return type

Iterable

write2file(logger, verbose=False)[source]¶: start iterator and write _result to file

ce_detector.scanner.assign_value(df_ce, ces, ns, ce_id, ns_id) → None[source]¶

assign value of child column for every cryptic exons that contains junction reads with N type

Given the start and end of cryptic exons and junction reads, Note: types of junction reads contains N, D, A, DA, NDA. For details: https://regtools.readthedocs.io/en/latest/commands/junctions-annotate/

Parameters

df_ce (pandas.DataFrame) – pandas.DataFrame of cryptic exons
ces (pandas.DataFrame) – cryptic exons’ pandas.DataFrame, which has gene ids that both cryptic exon and junction read own. It only contain gene id, start and end
ns (pandas.DataFrame) – pandas.DataFrame of junction reads whose type is N, as well as has same gene id as cryptic exons
ce_id (numpy.array) – index of gene id of cryptic exons which has junction reads as children
ns_id (numpy.array) – index of gene id of junction reads with N type, which have junction reads as children

ce_detector.scanner.check(axis) → numpy.array[source]¶

check if cryptic exon has children

Which means that check if the cryptic exon is split by others junction reads in terms of start and end position

Parameters: axis (numpy.array) – an array, a junction read, included start and end
Returns: whether junction read can split cryptic exon
Return type: bool

ce_detector.scanner.find_ce(groups) → Iterable[source]¶

parse _result getting from annotations in order to detect cryptic exons

Parameters: groups (Groupby object return from Dataframe.groupby) – annotated junction reads are grouped by strand and type
Returns: pd.DataFrame of two strands
Return type: Iterable

ce_detector.scanner.split_ce(df_ce, df_n) → Iterable[source]¶

Iterator: check whether detected cryptic exons are split by other junction reads

Parameters

df_ce (pandas.DataFrame) – pandas.DataFrame of cryptic exons return from find_ce()
df_n (pandas.DataFrame) – pandas.DataFrame of junction reads with N type

Returns

param:df_ce with new column children

Return type

iterator