Course Syllabus





Week 1: January 19

Teaching philosophy and class logistics.

Emerging scientific data trends and impacts on information systems. Data science and data informatics landscape.


Chris Anderson, "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete"

Petr Keil, Data-driven science is a failure of imagination

Katie L. Burke, (2014), A Safety Net for Scientific Data, American Scientist, Volume 102, Number 1, Page: 6, DOI: 10.1511/2014.106.6

Data Sharing and Management Snafu in 3 Short Acts

Week 2: January 26

What are scientific and technical data?

A theory of data: is one needed?

Scientific descriptions and models.
Edwards: data processing frictions

Scientific data forms, formats, properties, and sources.


George H. Mealy, Another look at data, Fall Joint Computer Conference (1967), pp. 525-534.

Paul N. Edwards, A Vast Machine (2010), Introduction, Chapter 1 and Chapter 5. [Textbook]

Preserving Scientific Data on Our Physical Universe: A New Strategy for Archiving the Nation's Scientific Information Resources. National Research Council (1995) ISBN: 0-309-52106-8. pp 10-32. [Textbook]

Week 3: February 2

Scientific data form and format variations.

Requirements of scientific data collections.


William Kent, The Many Forms of a Single Fact (1989)

Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century. National Science Board (NSF) (2005), pp. 17-23.

Jim Gray, et al., Scientific Data Management in the Coming Decade. ACM SIGMOD Record (Dec 2005), Vol. 34, No. 4, pp. 34-41.

Anastasia Ailamaki, et al., Managing Scientific Data. Communications of the ACM (June 2010), Vol. 53, No. 8, pp. 68-78. DOI:10.1145/1743546.1743568

Week 4: February 9

Metrics and assessments of specific datasets: describing, evaluating, and documenting discovery and usability of S& T data.

The "9+1 Properties of Usable Data Sets".

Evaluating and documenting S&T data quality.

Assignment: find and evaluate the usability of three different S&T datasets. Document discovery methods and findings on the wiki.

Due: February 16.

Peter Gleik (2013), Accuracy, precision, and significance: The misery of cholera

Lide, D., Data Quality - More Important than ever in the Internet Age , CODATA Data Science Journal, Volume 6, 23 December 2007.

Gillis, Justin (2010),  A Scientist, His Work and a Climate Reckoning, The New York Times, Dec. 21, 2010

Zednik, Stephan, Characterizing quality for science data products , Tetherless World Weblog, 30 December 2011.

IMF Data Quality Framework 2003 [Canvas]

Week 5: February 16 Evaluation of S&T metadata and markup.

Assignment: evaluate the "9+1 properties of usable data sets" metrics. Post your evaluations on the Canvas wiki page.

Due: February 23

William Kent, The Unsolvable Identity Problem (2003), Proceedings Extreme Markup Languages 2003

Introduction to 'Taxonomy for the twenty-first century'. H.C.J. Godfray & S. Knapp, (2004) Phil. Trans. R. Soc. Lond. B(2004) 359, pp. 559-569. DOI: 10.1098/rstb.2003.1457 [Canvas]

Allen H. Renear, Simone Sacchi, and Karen M. Wickett (2010) Definitions of dataset in the scientific and technical literature.

Ann Green, Stuart Macdonald, Robin Rice (2009) Policy-making for Research Data in Repositories: A Guide pp. 3-4, 13-17.

Week 6: February 23 Introduction to the Semantic Web and Linked Data initiatives in the sciences (basic terminology and concepts).

Ontological commitments and associated disorders.

Which Semantic Web? Catherine C. Marshall and Frank M. Shipman, Hypertext '03 (2003). [ ]

Elin K. Jacob (2003) Ontologies and the Semantic Web , Bulletin of the American Society for Information Science and Technology, April/May 2003, pp. 19-22.

Tim Berners Lee (2006, 2009) Linked Data

Lorna M. Campbell and Sheila MacNeill (2010), The Semantic Web, Linked and Open Data, JISC cetis Briefing Paper

Week 7: March 2
Introduction to the Semantic Web and Linked Data initiatives in the sciences 2:
(RDF and how-to use linked data).
Assignment: Concept map model of an S&T data informatics topic area. Create a Cmap and present in class.

Due March 9.

Joshua Tauberer (2008), What is RDF and what is it good for?

David Shotton, Linked Data 101: Linked Data and Libraries. [Canvas]

M. Mayernik, J. Phillips, and E. Nienhouse (2016), Linking Publications and Data: Challenges, Trends, and Opportunities

Week 8: March 9 Introduction to the Semantic Web and Linked Data initiatives in the sciences 3: (concept models and ontologies as models).

Paul N. Edwards, A Vast Machine (2010), Chapter 10 Making Data Global. [Textbook]

Giancarlo Guizzardi (2010), Theoretical foundations and engineering tools for building ontologies as reference conceptual models, Semantic Web, Vol. 1 (2010), pp. 3-10. DOI 10.3233/SW-2010-001 [2015-12-30: note that PDF provided by the journal does not contain any publication information. Better citation URL:]




Week 9: March 23 Introduction to the Semantic Web and Linked Data initiatives in the sciences 4: the CiTO Citation Typing Ontology
 Assignment: rOpenSci library assignment.

Due March 30.

David Shotton, CiTO, the Citation Typing Ontology, Journal of Biomedical Semantics (2010), 1(Suppl 1):56 [early version; good description]

SPAR Ontologies: Citation Typing Ontology (2012-2015) - Examples.

Week 10: March 30
Semantic Web and Linked S&T data: examples of ontology usage.

Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article, David Shotton, Katie Portwin, Graham Klyne, and Alastair Miles, PLoS Computational Biology 5(4) e1000361 (2009). doi: 10.1371/journal.pcbi.1000361.

RPI Tetherless World Constellation Product Quality Model Primer (n.d.)

Matthew S. Mayernik, et al. (2017) Building Geoscience Semantic Web Applications Using Established Ontologies

Week 11: April 6
Standards for S&T data citation and publication.

A Vast Machine: Standards as Social Technology, P.N. Edwards (2004). Science 304: 827-828

Mark D. Wilkinson et al., The FAIR Guiding Principles for scientific data management and stewardship, Nature Scientific Data 3:160018 (2016). DOI: 10.1038/sdata.2016.18

Ruth Duerr (2014), Data Citation and You: The new AGU guidelines for data citation. (slideshare)

Week 12: April 13 Preservation of S&T data. Assignment: FAIR principles as metrics.

Due April 20.

Berman, F. (2008). Got data? A guide to data preservation in the information age. CACM 51 (12): 50-56. doi:10.1145/1409360.1409376

John Timmer, "Preserving science: what to do with raw research material?"
John Timmer, "Preserving science: what data do we keep? What do we discard?"

Data Preservation in High Energy Physics. David M. South, on behalf of the ICFA DPHEP Study Group, 17 January 2011, arXiv:1101.3186v1

Week 13: April 20 Preservation of S&T data and
building a research data commons.

Goodman, A, et al., 10 Simple Rules for the Care and Feeding of Scientific Data, 9 January 2014, arXiv:1401.2134v1

Jerome H. Reichman and Paul F. Uhlir and Tom Dedeurwaerdere, "A Digitally Integrated Infrastructure for Microbial Data and Information" , Chapter 8, Section II.C. - Section III.C. (inclusive).
Week 14: April 27
Data provenance, and data processing curation.
Paul N. Edwards, A Vast Machine (2010), Chapter 11: Data Wars. [Textbook]

Goble, C., Position statement on Provenance, Workflow, and Annotations for Bioinformatics. [Canvas]

Goble, C., Stevens R., Hull D., Wolstencroft K., and Lopez R. (2008). Data curation + process curation = data integration + science. Briefings in Bioinformatics 9 (6): 506-517, doi:10.1093/bib/bbn034
Week 15: May 4
What is scientific and technical data informatics?
Project presentations.


Course Summary:

Date Details