Posts tagged OWL

OWL Domain Models as Abstract Workflows

The publication from the Wilkinson lab:  OWL Domain Models as Abstract Workflows  describes how we use SADI and SHARE to generate a workflow based on a biological domain model.  The upshot is that, by creating a biological model of a piece of data of interest, SHARE can cobble-together a series of SADI services that will find/generate data that matches that model.

OWL as hypothesis — reproducible in silico science

This is a draft of the introduction to my student Soroush Samidian’s Ph.D. thesis:Reproducibility is a cornerstone of Science. To be truly reproducible, an experiment should be explicit and thorough in describing every stage of the analysis, starting with the initial question or hypothesis, continuing on through the methodology by which candidate data were selected and analyzed, and finishing with a fully-documented result, including all provenance information (which resource, which version, when, and why). As modern biology becomes increasingly in silico-based, many of these best practices in reproducibility are being managed with much higher efficiency. The emergence of analytical workflows as first-class referenceable and shareable objects in bioinformatics has led to a high level of precision in describing in silico “materials and methods”, as well as the ability to automate collection of highly detailed provenance information. However, the earlier stages in the scientific process – the posing of the hypothesis and the selection of candidate data – are still largely limited to human cognition; we pose our hypotheses in the form of sentences, and we often select and screen candidate data based on expert knowledge or intuition. This is particularly acute in the interface between clinical sciences and molecular sciences, where clinicians are the ultimate arbiters of patient phenotypic classification, often based entirely on their personal expert opinion, while in contrast molecular association studies depend on deeply understanding these classifications in order to make statistical links between phenotypic traits and molecular traits. (read more…)

CardioSHARE walkthrough

Take a look at this query, which can be executed in the experimental OWL 2 CardioSHARE client.

PREFIX rdf: <>
PREFIX patients: <>
PREFIX bmi: <>
SELECT ?patient ?bmi
  ?patient rdf:type patients:AtRiskPatient .
  ?patient bmi:BMI ?bmi

We’re going to walk through what the CardioSHARE client does when this query is executed. Apologies if the anthropomorphic, intentional phrasing bothers you, but it simplifies the language considerably.

  1. The client is initialized with an empty knowledge base backed by an OWL reasoner. In this case, we’re using Pellet because the query refers to a class that uses an OWL 2.0 construct that isn’t supported by the other reasoners available to us.
  2. The client examines the FROM clause and notices that the named graph is a URL. It fetches the URL — using content negotiation to request RDF/XML — and stores the result in its knowledge base.
  3. The client attempts to order the query clauses so as to minimize the number of service calls and the amount of data that must be transferred over the network. For more detail on the query optimization process, consult Ben Vandervalk’s Master’s thesis.
  4. In this particular case, the client processes the ?patient rdf:type patients:AtRiskPatient clause first. This is an rdf:type clause, so the client assumes the object is the URI of an OWL class. There is no information in the client’s knowledge base about the AtRiskPatient class, so the client fetches the class URI using content negotiation as above. If the class URI was not also a URL (and so couldn’t be fetched) it would have to be defined in a document specified in a FROM clause.
  5. The client decomposes the AtRiskPatient class into its component restrictions. In this case, there is only one restriction: that some values of the property BMI are greater than 25.
  6. The client queries the SADI registry for services that can attach the BMI property. It finds one service,
  7. The client would like to use the candidates for the ?patient variable as input to the calculateBMI service, but this is the first time it has encountered that variable and there are no candidates. That being the case, the client examines its knowledge base for instances of the service’s input class, loading the class definition if necessary as above. SADI requires that input and output classes are identified by URLs that resolve to the appropriate definition, so we know this will work. The client adds the instances it finds to the candidates for the ?patient variable.
  8. The client invokes the calculateBMI service, using the candidates of the ?patient variable as input. It assembles the minimal RDF needed to satisfy the service’s input class definition for each input and POSTs that RDF to the service URL. The RDF that the service returns is added to the client’s knowledge base.
  9. The client moves on to process the ?patient bmi:BMI ?bmi clause. It queries the SADI registry for services that can attach the BMI property and finds the same service calculateBMI as above.
  10. The client invokes the calculateBMI service, using the candidates of the ?patient variable as input. Actually, no it doesn’t, because I glossed over a part of the procedure the last time the client did this: when it invokes a service, the client tracks which individuals it sent to that service; before it assembles the RDF that it’s going to send to a service, the client excludes any individuals it has already sent. So what actually happens here is that the client excludes all of the individuals it was going to send and just moves on.
  11. At this point, the client has run out of query clauses to process, so it turns the populated knowledge base over to a conventional SPARQL reasoner that executes the original query.

Questions? Comments? Pop over to the CardioSHARE Google Group.