WikiPathways¶
This module contains the methods to convert a WikiPathways RDF network into a BELGraph.
-
pathme.wikipathways.convert_to_bel.
convert_to_bel
(nodes: Dict[str, Dict], complexes: Dict[str, Dict], interactions: Dict[str, Dict], pathway_info, hgnc_manager: bio2bel_hgnc.manager.Manager) → pybel.struct.graph.BELGraph[source]¶ Convert RDF graph info to BEL.
This module contains the methods that run SPARQL queries to create the WikiPathways Graphs.
-
pathme.wikipathways.rdf_sparql.
PREFIXES
= {'chebi': Namespace('http://identifiers.org/chebi/'), 'chemspider': Namespace('http://identifiers.org/chemspider/'), 'dc': Namespace('http://purl.org/dc/elements/1.1/'), 'dcterms': Namespace('http://purl.org/dc/terms/'), 'ensembl': Namespace('http://identifiers.org/ensembl/'), 'hgnc': Namespace('http://identifiers.org/hgnc.symbol/'), 'hmdb': Namespace('http://identifiers.org/hmdb/'), 'ncbigene': Namespace('http://identifiers.org/ncbigene/'), 'pubchem': Namespace('http://rdf.ncbi.nlm.nih.gov/pubchem/compound/'), 'rdf': rdf.namespace.ClosedNamespace('http://www.w3.org/1999/02/22-rdf-syntax-ns#'), 'rdfs': rdf.namespace.ClosedNamespace('http://www.w3.org/2000/01/rdf-schema#'), 'uniprot': Namespace('http://identifiers.org/uniprot/'), 'wikidata': Namespace('http://www.wikidata.org/entity/'), 'wp': Namespace('http://vocabularies.wikipathways.org/wp#')}¶ SPARQL prefixes.
-
pathme.wikipathways.rdf_sparql.
GET_ENTRIES_SUBTYPES_SPARQL
= '\nSELECT DISTINCT ?uri_id ?uri_type\nWHERE {{\n ?pathway a wp:Pathway .\n ?uri_id dcterms:isPartOf ?pathway .\n ?uri_id a wp:{rdf_type} .\n ?uri_id rdf:type ?uri_type .\n}}\n'¶ SPARQL query to get all the subtypes for a specific primary {type} (DataNode or Interaction) in a pathway network.
-
pathme.wikipathways.rdf_sparql.
GET_ALL_DATA_NODES_SPARQL
= '\nSELECT DISTINCT\n ?uri_id\n ?name\n (STRAFTER(STR(?uri_type), str(wp:)) AS ?node_types)\n (?uri_id AS ?identifier)\n (?dc_identifier AS ?identifier)\n (STRAFTER(STR(?ncbigene_uri), str(ncbigene:)) AS ?identifier)\n (STRAFTER(STR(?chebi_uri), str(chebi:)) AS ?identifier)\n (STRAFTER(STR(?hgnc_uri), str(hgnc:)) AS ?bdb_hgncsymbol)\n (STRAFTER(STR(?ensembl_uri), str(ensembl:)) AS ?bdb_ensembl)\n (STRAFTER(STR(?ncbigene_uri), str(ncbigene:)) AS ?bdb_ncbigene)\n (STRAFTER(STR(?uniprot_uri), str(uniprot:)) AS ?bdb_uniprot)\n (STRAFTER(STR(?chebi_uri), str(chebi:)) AS ?bdb_chebi)\n (STRAFTER(STR(?chemspider_uri), str(chemspider:)) AS ?bdb_chemspider)\n (STRAFTER(STR(?pubchem_uri), str(pubchem:)) AS ?bdb_pubchem)\n (STRAFTER(STR(?wikidata_uri), str(wikidata:)) AS ?bdb_wikidata)\n (STRAFTER(STR(?hmdb_uri), str(hmdb:)) AS ?bdb_hmdb)\nWHERE {\n ?pathway a wp:Pathway .\n ?uri_id dcterms:isPartOf ?pathway .\n\n ?uri_id a wp:DataNode .\n ?uri_id rdf:type ?uri_type .\n\n optional {?uri_id dcterms:identifier ?dc_identifier .}\n optional {?uri_id wp:bdbHgncSymbol ?hgnc_uri .}\n optional {?uri_id wp:bdbEnsembl ?ensembl_uri .}\n optional {?uri_id wp:bdbEntrezGene ?ncbigene_uri .}\n optional {?uri_id wp:bdbUniprot ?uniprot_uri .}\n optional {?uri_id wp:bdbChEBI ?chebi_uri .}\n optional {?uri_id wp:bdbChemspider ?chemspider_uri .}\n optional {?uri_id wp:bdbPubChem ?pubchem_uri .}\n optional {?uri_id wp:bdbWikidata ?wikidata_uri .}\n optional {?uri_id wp:bdbHmdb ?hmdba_uri .}\n\n ?uri_id rdfs:label ?name .\n}\n'¶ SPARQL query to get all data nodes in a pathway network with some arguments.
-
pathme.wikipathways.rdf_sparql.
GET_ALL_COMPLEXES_SPARQL
= '\nSELECT DISTINCT\n ?uri_id\n (STRAFTER(STR(?uri_type), str(wp:)) AS ?node_types)\n (?participants_entry AS ?participants)\n (?participants_id AS ?participants)\n ?name\n (STRAFTER(STR(?ncbigene_participants), str(ncbigene:)) AS ?participants)\nWHERE {\n ?pathway a wp:Pathway .\n ?uri_id dcterms:isPartOf ?pathway .\n ?uri_id a wp:Complex .\n ?uri_id rdf:type ?uri_type .\n ?uri_id wp:participants ?participants_entry .\n optional {?participants_entry dcterms:identifier ?participants_id .}\n optional {?participants_entry wp:bdbEntrezGene ?ncbigene_participants .}\n}\n'¶ SPARQL query to get all data nodes in a pathway network with some arguments.
-
pathme.wikipathways.rdf_sparql.
GET_ALL_DIRECTED_INTERACTIONS_SPARQL
= '\nSELECT DISTINCT\n (?source_entry AS ?source)\n (?dc_source AS ?source)\n (?target_entry AS ?target)\n (?dc_target AS ?target)\n ?uri_id\n (STRAFTER(STR(?uri_id), "/Interaction/") AS ?identifier)\n (STRAFTER(STR(?uri_type), str(wp:)) AS ?interaction_types)\n (STRAFTER(STR(?ncbigene_source), str(ncbigene:)) AS ?source )\n (STRAFTER(STR(?ncbigene_target), str(ncbigene:)) AS ?target )\nWHERE {\n ?pathway a wp:Pathway .\n ?uri_id dcterms:isPartOf ?pathway .\n ?uri_id a wp:DirectedInteraction .\n ?uri_id rdf:type ?uri_type .\n ?uri_id wp:source ?source_entry .\n ?uri_id wp:target ?target_entry .\n optional {?source_entry dcterms:identifier ?dc_source .}\n optional {?target_entry dcterms:identifier ?dc_target .}\n optional {?source_entry wp:bdbEntrezGene ?ncbigene_source .}\n optional {?target_entry wp:bdbEntrezGene ?ncbigene_target .}\n}\n'¶ SPARQL query to get all directed interactions in a pathway network with source and target.
-
pathme.wikipathways.rdf_sparql.
GET_PATHWAY_INFO_SPARQL
= '\nSELECT DISTINCT ?title ?identifier ?description ?pathway_id\nWHERE {\n ?pathway_id a wp:Pathway .\n ?pathway_id dc:title ?title .\n ?pathway_id dcterms:description ?description .\n ?pathway_id dcterms:identifier ?identifier .\n}\n'¶ Queries managers
-
pathme.wikipathways.rdf_sparql.
get_wp_statistics
(resource_files, resource_folder, hgnc_manager) → Tuple[Dict[str, Dict[str, int]], Dict[str, Dict[str, Dict[str, int]]]][source]¶ Load WikiPathways RDF to BELGraph.
-
pathme.wikipathways.rdf_sparql.
rdf_wikipathways_to_bel
(rdf_graph: rdflib.graph.Graph, hgnc_manager) → pybel.struct.graph.BELGraph[source]¶ Convert RDF graph to BELGraph.
- Parameters
rdf_graph – RDF graph
bio2bel_hgnc.Manager – HGNC manager
-
pathme.wikipathways.rdf_sparql.
wikipathways_to_bel
(file_path: str, hgnc_manager)[source]¶ Convert WikiPathways RDF file to BEL.
- Parameters
file_path (str) – path to the file
bio2bel_hgnc.Manager – HGNC manager
- Return type
-
pathme.wikipathways.rdf_sparql.
wikipathways_to_pickles
(resource_files: Iterable[str], resource_folder: str, hgnc_manager: bio2bel_hgnc.manager.Manager, export_folder: str) → None[source]¶ Export WikiPathways to Pickles.
- Parameters
resource_files – iterator with file names
resource_folder – path folder
hgnc_manager – HGNC manager
export_folder – export folder
This module has utilities method for parsing, handling WikiPathways RDF and data.
-
pathme.wikipathways.utils.
evaluate_wikipathways_metadata
(metadata: Union[str, Set[str]]) → str[source]¶ Evaluate metadata in wikipathways and return the string representation.
-
pathme.wikipathways.utils.
get_valid_gene_identifier
(node_ids_dict, hgnc_manager: bio2bel_hgnc.manager.Manager, pathway_id) → Tuple[str, str, str][source]¶ Return protein/gene identifier for a given RDF node.
- Parameters
node_ids_dict (dict) – node dictionary
hgnc_manager – hgnc manager
- Returns
namespace, name, identifier
-
pathme.wikipathways.utils.
check_multiple
(element, element_name, pathway_id)[source]¶ Check whether element is iterable.
- Parameters
element – variable to check
element_name – name to print
- Returns
-
pathme.wikipathways.utils.
convert_to_nx
(nodes: Dict[str, Dict], interactions: List[Tuple[str, str, Dict]], pathway_info: Dict) → networkx.classes.multidigraph.MultiDiGraph[source]¶ Generate a NetworkX Graph from a network data structure (dict with nodes and edges).
- Parameters
nodes – Node id as keys and Node attributes as values
interactions – list of interactions
pathway_info – pathway info dictionary
-
pathme.wikipathways.utils.
debug_pathway_info
(bel_graph: pybel.struct.graph.BELGraph, pathway_path, **kwargs)[source]¶ Debug information about the pathway graph representation.
- Parameters
bel_graph (pybel.BELGraph) – bel graph
pathway_path (str) – path of the pathway
-
pathme.wikipathways.utils.
debug_global_statistics
(global_statistics)[source]¶ Debug pathway statistics.
- Parameters
global_statistics (dict) – pathway statistics
-
pathme.wikipathways.utils.
get_file_name_from_url
(url: str) → str[source]¶ Get the last part of an URL.
-
pathme.wikipathways.utils.
unzip_file
(file_path: str, export_folder: str)[source]¶ Unzip file into a destination folder.
- Parameters
file_path – name of the file
export_folder – name of the file
-
pathme.wikipathways.utils.
filter_wikipathways_files
(file_names: Iterable[str]) → List[str][source]¶ Filter files that have not ‘ttl’ extension or not start with ‘WP’.
-
pathme.wikipathways.utils.
iterate_wikipathways_paths
(directory: str, connection: Optional[str] = None, only_canonical: bool = True) → List[str][source]¶ Get WikiPathways RDF files in folder.
- Parameters
directory – folder path
connection – database connection
only_canonical – only identifiers present in WP bio2bel db
Custom parser¶
This module contains the custom parser for RDF.
-
pathme.wikipathways.json_rdf_parser.
parse_id_uri
(uri)[source]¶ Get the components of a given uri (with identifier at the last position).
- Parameters
uri (str) – URI
- Returns
prefix (ex: http://rdf.wikipathways.org/…)
- Returns
prefix_namespaces: if there are many namespaces, until the penultimate (ex: …/Pathway/WP22_r97775/…)
- Returns
namespace: if there are many namespaces, the last (ex: …/Interaction/)
- Returns
identifier (ex: …/c562c/)
- Return type
-
pathme.wikipathways.json_rdf_parser.
parse_namespace_uri
(uri)[source]¶ Get the prefix and namespace of a given URI (without identifier, only with a namspace at last position).
- Parameters
uri (str) – URI
- Returns
prefix (ex: http://purl.org/dc/terms/…)
- Returns
namespace (ex: …/isPartOf)
- Return type
-
pathme.wikipathways.json_rdf_parser.
match_entry_type
(types)[source]¶ Get the type identifier from the type attributes of an entry.
-
pathme.wikipathways.json_rdf_parser.
match_attribute_label
(attribute_namespace)[source]¶ Assign differently an attribute label depending on the namespace of the attribute label (attribute key).
-
pathme.wikipathways.json_rdf_parser.
match_entry
(entry)[source]¶ For a given entry, get the identifier of the entry and also the entry_type.
-
pathme.wikipathways.json_rdf_parser.
match_attribute
(uri)[source]¶ For a given attribute @id URI, get the label to be assigned to the attribute.
-
pathme.wikipathways.json_rdf_parser.
get_entry_attribute_value
(entry_label, node_id, attribute_label, graph)[source]¶ For a given node (if entry type is nodes), entry_type and attribute, get the associated value in the graph object.
-
pathme.wikipathways.json_rdf_parser.
set_entry_attribute
(entry_type, node_id, attribute_label, value, graph)[source]¶ For a given node (if the entry type is ‘nodes’), entry_type and attribute, set the associated value in the graph object.
-
pathme.wikipathways.json_rdf_parser.
set_interaction
(entry, graph)[source]¶ For a given interaction entry sets the source and target (and the interaction id) into the pathway graph.
-
pathme.wikipathways.json_rdf_parser.
get_entry_type
(types)[source]¶ For a set of uris that indicate the entry’s type, get the type identifier (call match_entry_type).
-
pathme.wikipathways.json_rdf_parser.
parse_attribute_values
(entry_label, entry_id, attribute_values, attribute_label, graph)[source]¶ For each value in attribute_values, taking into account the attribute_label type (if it is specified value_namespace thus would be the value namespace), adds a new entry to the graph (calling set_entry_attribute methode) beeing the last level of parsing. The value is added as a set if there are multiple values for the same attribute_labe or as a sigle value.
-
pathme.wikipathways.json_rdf_parser.
parse_attributes
(entry, entry_type, entry_id, graph)[source]¶ For each attribute in attributes, if is labbeled as a uri (not in {‘@id’, ‘@value’, ‘@type’}) gets the attribute_type (calling the correspondent function) and calls the next statement (parse_attribute_values).
-
pathme.wikipathways.json_rdf_parser.
generate_empty_pathway_graph
()[source]¶ Test set entry attribute.
-
pathme.wikipathways.json_rdf_parser.
parse_entries
(entries)[source]¶ Create the graph object which will be finally returned full, with the values of the statements parser calls (entry -> attributes -> values). First the type of the entry and the id is obtined with match_entry, and according the type retrived is haddled the entry in a particular manner. If the entry is ‘interactions’ type, only will be called the methode set_interaction, but if not in depth parser if the different levels will be called (parse_attributes and parse_attribute_values).
-
pathme.wikipathways.json_rdf_parser.
convert_json
(graph: rdflib.graph.Graph)[source]¶ Convert from rdflib importated graph object to python data structure (list of dicts of each entry).
-
pathme.wikipathways.json_rdf_parser.
parse_pathway
(pathway_path)[source]¶ After importing the indicated pathway from text file resources into a graph rdflib object(import_pathway), calls the diferent data types transformations (convert_json function) and the first statement of the parser that will return a graph data structure (parse_entries function). This retrieved graph will be converted to a networX graph (convert_to_nx function).
- Parameters
pathway_path (str) – pathway identifier
- Return type
networkx.MultiDiGraph