WikiPathways

This module contains the methods to convert a WikiPathways RDF network into a BELGraph.

pathme.wikipathways.convert_to_bel.convert_to_bel(nodes: Dict[str, Dict], complexes: Dict[str, Dict], interactions: Dict[str, Dict], pathway_info, hgnc_manager: bio2bel_hgnc.manager.Manager) → pybel.struct.graph.BELGraph[source]

Convert RDF graph info to BEL.

This module contains the methods that run SPARQL queries to create the WikiPathways Graphs.

pathme.wikipathways.rdf_sparql.PREFIXES = {'chebi': Namespace('http://identifiers.org/chebi/'), 'chemspider': Namespace('http://identifiers.org/chemspider/'), 'dc': Namespace('http://purl.org/dc/elements/1.1/'), 'dcterms': Namespace('http://purl.org/dc/terms/'), 'ensembl': Namespace('http://identifiers.org/ensembl/'), 'hgnc': Namespace('http://identifiers.org/hgnc.symbol/'), 'hmdb': Namespace('http://identifiers.org/hmdb/'), 'ncbigene': Namespace('http://identifiers.org/ncbigene/'), 'pubchem': Namespace('http://rdf.ncbi.nlm.nih.gov/pubchem/compound/'), 'rdf': rdf.namespace.ClosedNamespace('http://www.w3.org/1999/02/22-rdf-syntax-ns#'), 'rdfs': rdf.namespace.ClosedNamespace('http://www.w3.org/2000/01/rdf-schema#'), 'uniprot': Namespace('http://identifiers.org/uniprot/'), 'wikidata': Namespace('http://www.wikidata.org/entity/'), 'wp': Namespace('http://vocabularies.wikipathways.org/wp#')}

SPARQL prefixes.

pathme.wikipathways.rdf_sparql.GET_ENTRIES_SUBTYPES_SPARQL = '\nSELECT DISTINCT ?uri_id ?uri_type\nWHERE {{\n ?pathway a wp:Pathway .\n ?uri_id dcterms:isPartOf ?pathway .\n ?uri_id a wp:{rdf_type} .\n ?uri_id rdf:type ?uri_type .\n}}\n'

SPARQL query to get all the subtypes for a specific primary {type} (DataNode or Interaction) in a pathway network.

pathme.wikipathways.rdf_sparql.GET_ALL_DATA_NODES_SPARQL = '\nSELECT DISTINCT\n ?uri_id\n ?name\n (STRAFTER(STR(?uri_type), str(wp:)) AS ?node_types)\n (?uri_id AS ?identifier)\n (?dc_identifier AS ?identifier)\n (STRAFTER(STR(?ncbigene_uri), str(ncbigene:)) AS ?identifier)\n (STRAFTER(STR(?chebi_uri), str(chebi:)) AS ?identifier)\n (STRAFTER(STR(?hgnc_uri), str(hgnc:)) AS ?bdb_hgncsymbol)\n (STRAFTER(STR(?ensembl_uri), str(ensembl:)) AS ?bdb_ensembl)\n (STRAFTER(STR(?ncbigene_uri), str(ncbigene:)) AS ?bdb_ncbigene)\n (STRAFTER(STR(?uniprot_uri), str(uniprot:)) AS ?bdb_uniprot)\n (STRAFTER(STR(?chebi_uri), str(chebi:)) AS ?bdb_chebi)\n (STRAFTER(STR(?chemspider_uri), str(chemspider:)) AS ?bdb_chemspider)\n (STRAFTER(STR(?pubchem_uri), str(pubchem:)) AS ?bdb_pubchem)\n (STRAFTER(STR(?wikidata_uri), str(wikidata:)) AS ?bdb_wikidata)\n (STRAFTER(STR(?hmdb_uri), str(hmdb:)) AS ?bdb_hmdb)\nWHERE {\n ?pathway a wp:Pathway .\n ?uri_id dcterms:isPartOf ?pathway .\n\n ?uri_id a wp:DataNode .\n ?uri_id rdf:type ?uri_type .\n\n optional {?uri_id dcterms:identifier ?dc_identifier .}\n optional {?uri_id wp:bdbHgncSymbol ?hgnc_uri .}\n optional {?uri_id wp:bdbEnsembl ?ensembl_uri .}\n optional {?uri_id wp:bdbEntrezGene ?ncbigene_uri .}\n optional {?uri_id wp:bdbUniprot ?uniprot_uri .}\n optional {?uri_id wp:bdbChEBI ?chebi_uri .}\n optional {?uri_id wp:bdbChemspider ?chemspider_uri .}\n optional {?uri_id wp:bdbPubChem ?pubchem_uri .}\n optional {?uri_id wp:bdbWikidata ?wikidata_uri .}\n optional {?uri_id wp:bdbHmdb ?hmdba_uri .}\n\n ?uri_id rdfs:label ?name .\n}\n'

SPARQL query to get all data nodes in a pathway network with some arguments.

pathme.wikipathways.rdf_sparql.GET_ALL_COMPLEXES_SPARQL = '\nSELECT DISTINCT\n ?uri_id\n (STRAFTER(STR(?uri_type), str(wp:)) AS ?node_types)\n (?participants_entry AS ?participants)\n (?participants_id AS ?participants)\n ?name\n (STRAFTER(STR(?ncbigene_participants), str(ncbigene:)) AS ?participants)\nWHERE {\n ?pathway a wp:Pathway .\n ?uri_id dcterms:isPartOf ?pathway .\n ?uri_id a wp:Complex .\n ?uri_id rdf:type ?uri_type .\n ?uri_id wp:participants ?participants_entry .\n optional {?participants_entry dcterms:identifier ?participants_id .}\n optional {?participants_entry wp:bdbEntrezGene ?ncbigene_participants .}\n}\n'

SPARQL query to get all data nodes in a pathway network with some arguments.

pathme.wikipathways.rdf_sparql.GET_ALL_DIRECTED_INTERACTIONS_SPARQL = '\nSELECT DISTINCT\n (?source_entry AS ?source)\n (?dc_source AS ?source)\n (?target_entry AS ?target)\n (?dc_target AS ?target)\n ?uri_id\n (STRAFTER(STR(?uri_id), "/Interaction/") AS ?identifier)\n (STRAFTER(STR(?uri_type), str(wp:)) AS ?interaction_types)\n (STRAFTER(STR(?ncbigene_source), str(ncbigene:)) AS ?source )\n (STRAFTER(STR(?ncbigene_target), str(ncbigene:)) AS ?target )\nWHERE {\n ?pathway a wp:Pathway .\n ?uri_id dcterms:isPartOf ?pathway .\n ?uri_id a wp:DirectedInteraction .\n ?uri_id rdf:type ?uri_type .\n ?uri_id wp:source ?source_entry .\n ?uri_id wp:target ?target_entry .\n optional {?source_entry dcterms:identifier ?dc_source .}\n optional {?target_entry dcterms:identifier ?dc_target .}\n optional {?source_entry wp:bdbEntrezGene ?ncbigene_source .}\n optional {?target_entry wp:bdbEntrezGene ?ncbigene_target .}\n}\n'

SPARQL query to get all directed interactions in a pathway network with source and target.

pathme.wikipathways.rdf_sparql.GET_PATHWAY_INFO_SPARQL = '\nSELECT DISTINCT ?title ?identifier ?description ?pathway_id\nWHERE {\n ?pathway_id a wp:Pathway .\n ?pathway_id dc:title ?title .\n ?pathway_id dcterms:description ?description .\n ?pathway_id dcterms:identifier ?identifier .\n}\n'

Queries managers

pathme.wikipathways.rdf_sparql.get_wp_statistics(resource_files, resource_folder, hgnc_manager) → Tuple[Dict[str, Dict[str, int]], Dict[str, Dict[str, Dict[str, int]]]][source]

Load WikiPathways RDF to BELGraph.

Parameters
  • resource_files (iter[str]) – RDF file path

  • resource_folder (str) – RDF file path

pathme.wikipathways.rdf_sparql.rdf_wikipathways_to_bel(rdf_graph: rdflib.graph.Graph, hgnc_manager) → pybel.struct.graph.BELGraph[source]

Convert RDF graph to BELGraph.

Parameters
  • rdf_graph – RDF graph

  • bio2bel_hgnc.Manager – HGNC manager

pathme.wikipathways.rdf_sparql.wikipathways_to_bel(file_path: str, hgnc_manager)[source]

Convert WikiPathways RDF file to BEL.

Parameters
  • file_path (str) – path to the file

  • bio2bel_hgnc.Manager – HGNC manager

Return type

pybel.BELGraph

pathme.wikipathways.rdf_sparql.wikipathways_to_pickles(resource_files: Iterable[str], resource_folder: str, hgnc_manager: bio2bel_hgnc.manager.Manager, export_folder: str) → None[source]

Export WikiPathways to Pickles.

Parameters
  • resource_files – iterator with file names

  • resource_folder – path folder

  • hgnc_manager – HGNC manager

  • export_folder – export folder

This module has utilities method for parsing, handling WikiPathways RDF and data.

pathme.wikipathways.utils.evaluate_wikipathways_metadata(metadata: Union[str, Set[str]]) → str[source]

Evaluate metadata in wikipathways and return the string representation.

pathme.wikipathways.utils.get_valid_gene_identifier(node_ids_dict, hgnc_manager: bio2bel_hgnc.manager.Manager, pathway_id) → Tuple[str, str, str][source]

Return protein/gene identifier for a given RDF node.

Parameters
  • node_ids_dict (dict) – node dictionary

  • hgnc_manager – hgnc manager

Returns

namespace, name, identifier

pathme.wikipathways.utils.check_multiple(element, element_name, pathway_id)[source]

Check whether element is iterable.

Parameters
  • element – variable to check

  • element_name – name to print

Returns

pathme.wikipathways.utils.merge_two_dicts(dict1, dict2)[source]

Merge two dictionaries.

Parameters
Returns

merged_dict

Return type

dict

pathme.wikipathways.utils.convert_to_nx(nodes: Dict[str, Dict], interactions: List[Tuple[str, str, Dict]], pathway_info: Dict) → networkx.classes.multidigraph.MultiDiGraph[source]

Generate a NetworkX Graph from a network data structure (dict with nodes and edges).

Parameters
  • nodes – Node id as keys and Node attributes as values

  • interactions – list of interactions

  • pathway_info – pathway info dictionary

pathme.wikipathways.utils.debug_pathway_info(bel_graph: pybel.struct.graph.BELGraph, pathway_path, **kwargs)[source]

Debug information about the pathway graph representation.

Parameters
  • bel_graph (pybel.BELGraph) – bel graph

  • pathway_path (str) – path of the pathway

pathme.wikipathways.utils.debug_global_statistics(global_statistics)[source]

Debug pathway statistics.

Parameters

global_statistics (dict) – pathway statistics

pathme.wikipathways.utils.get_file_name_from_url(url: str) → str[source]

Get the last part of an URL.

pathme.wikipathways.utils.unzip_file(file_path: str, export_folder: str)[source]

Unzip file into a destination folder.

Parameters
  • file_path – name of the file

  • export_folder – name of the file

pathme.wikipathways.utils.filter_wikipathways_files(file_names: Iterable[str]) → List[str][source]

Filter files that have not ‘ttl’ extension or not start with ‘WP’.

pathme.wikipathways.utils.iterate_wikipathways_paths(directory: str, connection: Optional[str] = None, only_canonical: bool = True) → List[str][source]

Get WikiPathways RDF files in folder.

Parameters
  • directory – folder path

  • connection – database connection

  • only_canonical – only identifiers present in WP bio2bel db

Custom parser

This module contains the custom parser for RDF.

pathme.wikipathways.json_rdf_parser.parse_id_uri(uri)[source]

Get the components of a given uri (with identifier at the last position).

Parameters

uri (str) – URI

Returns

prefix (ex: http://rdf.wikipathways.org/…)

Returns

prefix_namespaces: if there are many namespaces, until the penultimate (ex: …/Pathway/WP22_r97775/…)

Returns

namespace: if there are many namespaces, the last (ex: …/Interaction/)

Returns

identifier (ex: …/c562c/)

Return type

tuple[str,str,str,str]

pathme.wikipathways.json_rdf_parser.parse_namespace_uri(uri)[source]

Get the prefix and namespace of a given URI (without identifier, only with a namspace at last position).

Parameters

uri (str) – URI

Returns

prefix (ex: http://purl.org/dc/terms/…)

Returns

namespace (ex: …/isPartOf)

Return type

tuple[str,str]

pathme.wikipathways.json_rdf_parser.match_entry_type(types)[source]

Get the type identifier from the type attributes of an entry.

Parameters

types (set) – set with wp vocabularies as string.

Returns

entry_type: type identifier, also used as first key to the main graph (ex: ‘nodes’ -> graph[entry_type][node_id])

Return type

Optional[str]

pathme.wikipathways.json_rdf_parser.match_attribute_label(attribute_namespace)[source]

Assign differently an attribute label depending on the namespace of the attribute label (attribute key).

Parameters

attribute_namespace (str) – the namespace of the attribute specification

Returns

attribute_label: label to identify the attribute

Return type

Optional[str]

pathme.wikipathways.json_rdf_parser.match_entry(entry)[source]

For a given entry, get the identifier of the entry and also the entry_type.

Parameters

entry (dict) –

Returns

entry_id: for nodes would be the ncbi id and edges the interaction wp id

Returns

entry_type: type identifier, also used as first key to the main graph (ex: ‘nodes’ -> graph[entry_type][node_id])

Return type

Optional[tuple[str,str]]

pathme.wikipathways.json_rdf_parser.match_attribute(uri)[source]

For a given attribute @id URI, get the label to be assigned to the attribute.

Parameters

uri (str) – attribute_label as URI

Returns

attribute_label: label to be assigned to the attribute

Return type

Optional[str]

pathme.wikipathways.json_rdf_parser.get_entry_attribute_value(entry_label, node_id, attribute_label, graph)[source]

For a given node (if entry type is nodes), entry_type and attribute, get the associated value in the graph object.

Parameters
  • node_id (str) – ncbi id

  • attribute_label (str) – ex: ‘source’, ‘title’, ‘description’…

  • entry_label (str) – ex: ‘nodes’, ‘title’, ‘description’…

  • graph (dict) –

Returns

attribute_value: associated value in the graph object

Return type

Optional[str]

pathme.wikipathways.json_rdf_parser.set_entry_attribute(entry_type, node_id, attribute_label, value, graph)[source]

For a given node (if the entry type is ‘nodes’), entry_type and attribute, set the associated value in the graph object.

Parameters
  • node_id (str) – ncbi id

  • attribute_label (str) – attribute label

  • entry_label (str) – entry label

  • value (str) – value

  • graph (dict) – graph object

pathme.wikipathways.json_rdf_parser.set_interaction(entry, graph)[source]

For a given interaction entry sets the source and target (and the interaction id) into the pathway graph.

Parameters
  • entry (dict) – entry whose type has been previous identified as ‘interactions’

  • graph (dict) – pathway network graph object

pathme.wikipathways.json_rdf_parser.get_entry_type(types)[source]

For a set of uris that indicate the entry’s type, get the type identifier (call match_entry_type).

Parameters

types (set) – set of uris (from the entry’s @type attribute) that indicate the entry’s type

Returns

entry_type: type identifier, also used as first key to the main graph (ex: ‘nodes’ -> graph[entry_type][node_id])

Return type

str

pathme.wikipathways.json_rdf_parser.parse_attribute_values(entry_label, entry_id, attribute_values, attribute_label, graph)[source]

For each value in attribute_values, taking into account the attribute_label type (if it is specified value_namespace thus would be the value namespace), adds a new entry to the graph (calling set_entry_attribute methode) beeing the last level of parsing. The value is added as a set if there are multiple values for the same attribute_labe or as a sigle value.

Parameters
  • entry_label (str) – entry label

  • entry_id (str) – entry identifier

  • attribute_values (list[dict]) – values

  • attribute_label (str) – label

  • graph (dict) – graph object

pathme.wikipathways.json_rdf_parser.parse_attributes(entry, entry_type, entry_id, graph)[source]

For each attribute in attributes, if is labbeled as a uri (not in {‘@id’, @value’, @type’}) gets the attribute_type (calling the correspondent function) and calls the next statement (parse_attribute_values).

Parameters
  • attributes (dict) – attributes of the entity

  • entry_type (str) – entity type

  • entry_id (str) – entry identifier

  • graph (dict) – graph object

pathme.wikipathways.json_rdf_parser.generate_empty_pathway_graph()[source]

Test set entry attribute.

pathme.wikipathways.json_rdf_parser.parse_entries(entries)[source]

Create the graph object which will be finally returned full, with the values of the statements parser calls (entry -> attributes -> values). First the type of the entry and the id is obtined with match_entry, and according the type retrived is haddled the entry in a particular manner. If the entry is ‘interactions’ type, only will be called the methode set_interaction, but if not in depth parser if the different levels will be called (parse_attributes and parse_attribute_values).

Parameters

entries (list[dict]) –

Return type

networkx.MultiDiGraph

pathme.wikipathways.json_rdf_parser.convert_json(graph: rdflib.graph.Graph)[source]

Convert from rdflib importated graph object to python data structure (list of dicts of each entry).

Parameters

graph (rdflib.graph) – graph object

Return type

list[dict]

pathme.wikipathways.json_rdf_parser.parse_pathway(pathway_path)[source]

After importing the indicated pathway from text file resources into a graph rdflib object(import_pathway), calls the diferent data types transformations (convert_json function) and the first statement of the parser that will return a graph data structure (parse_entries function). This retrieved graph will be converted to a networX graph (convert_to_nx function).

Parameters

pathway_path (str) – pathway identifier

Return type

networkx.MultiDiGraph