This is a draft of the introduction to my student Soroush Samidian’s Ph.D. thesis:Reproducibility is a cornerstone of Science. To be truly reproducible, an experiment should be explicit and thorough in describing every stage of the analysis, starting with the initial question or hypothesis, continuing on through the methodology by which candidate data were selected and analyzed, and finishing with a fully-documented result, including all provenance information (which resource, which version, when, and why). As modern biology becomes increasingly in silico-based, many of these best practices in reproducibility are being managed with much higher efficiency. The emergence of analytical workflows as first-class referenceable and shareable objects in bioinformatics has led to a high level of precision in describing in silico “materials and methods”, as well as the ability to automate collection of highly detailed provenance information. However, the earlier stages in the scientific process – the posing of the hypothesis and the selection of candidate data – are still largely limited to human cognition; we pose our hypotheses in the form of sentences, and we often select and screen candidate data based on expert knowledge or intuition. This is particularly acute in the interface between clinical sciences and molecular sciences, where clinicians are the ultimate arbiters of patient phenotypic classification, often based entirely on their personal expert opinion, while in contrast molecular association studies depend on deeply understanding these classifications in order to make statistical links between phenotypic traits and molecular traits.Recently, new standards have emerged that allow us to explicitly express “Knowledge”. In particular, the endorsement of the Ontology language OWL by the W3C has provided a global standard for knowledge representation which is showing particularly rapid adoption within the life sciences and health sciences communities. Though there are numerous examples of Ontologies being used to describe “what is” (i.e. to describe a particular aspect of biological reality), we have found no examples of Ontologies being used, in practice, to describe “what might be” (i.e. a hypothetical, unproven view of biological reality). Given the constantly changing nature of “biological reality”, we find this distinction to be completely artificial, and this viewpoint motivates the thesis proposal described here.We propose that, to achieve scientific rigor and reproducibility, it is important to consider, create, and evaluate approaches for explicitly representing hypotheses and phenotypic classification systems, in particular, in the clinical domain. Moreover, we propose that the OWL language is an appropriate tool for representing these hypothetical and/or subjective perspectives in a concrete way. We will support these arguments with empirical and quantitative studies. We will then explore and evaluate the feasibility of the approach by attempting to define interfaces support clinical researchers as they model their hypotheses and classification systems in the highly complex OWL language. Finally, we plan to demonstrate the utility of the approach by demonstrating that, when expressed in OWL, many biomedical hypotheses can be automatically evaluated in silico using novel semantic workflow technologies developed in our laboratory.