Search | VHL Regional Portal

The khmer software package: enabling efficient nucleotide sequence analysis.

Crusoe, Michael R; Alameldin, Hussien F; Awad, Sherine; Boucher, Elmar; Caldwell, Adam; Cartwright, Reed; Charbonneau, Amanda; Constantinides, Bede; Edvenson, Greg; Fay, Scott; Fenton, Jacob; Fenzl, Thomas; Fish, Jordan; Garcia-Gutierrez, Leonor; Garland, Phillip; Gluck, Jonathan; González, Iván; Guermond, Sarah; Guo, Jiarong; Gupta, Aditi; Herr, Joshua R; Howe, Adina; Hyer, Alex; Härpfer, Andreas; Irber, Luiz; Kidd, Rhys; Lin, David; Lippi, Justin; Mansour, Tamer; McA'Nulty, Pamela; McDonald, Eric; Mizzi, Jessica; Murray, Kevin D; Nahum, Joshua R; Nanlohy, Kaben; Nederbragt, Alexander Johan; Ortiz-Zuazaga, Humberto; Ory, Jeramia; Pell, Jason; Pepe-Ranney, Charles; Russ, Zachary N; Schwarz, Erich; Scott, Camille; Seaman, Josiah; Sievert, Scott; Simpson, Jared; Skennerton, Connor T; Spencer, James; Srinivasan, Ramakrishnan; Standage, Daniel.

F1000Res ; 4: 900, 2015.

Article in English | MEDLINE | ID: mdl-26535114

ABSTRACT

The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely available under the BSD license at https://github.com/dib-lab/khmer/.

A novel pharmaceutical-ACO collaboration: the Merck/Heritage Provider Network open innovation challenge.

Bhandari, Aman; Chatterjee, Arnaub; Holoubek, Sara; Powers, Brian; Gluck, Jonathan; Jain, Sachin H.

Am J Manag Care ; 20(10 Spec No): E4, 2014 Jul.

Article in English | MEDLINE | ID: mdl-25549555

Subject(s)

Accountable Care Organizations , Cooperative Behavior , Disease Management , Drug Industry , Patient Compliance , Diabetes Mellitus/therapy , Heart Diseases/therapy , Humans , Organizational Innovation , United States

De-identification methods for open health data: the case of the Heritage Health Prize claims dataset.

El Emam, Khaled; Arbuckle, Luk; Koru, Gunes; Eze, Benjamin; Gaudette, Lisa; Neri, Emilio; Rose, Sean; Howard, Jeremy; Gluck, Jonathan.

J Med Internet Res ; 14(1): e33, 2012 Feb 27.

Article in English | MEDLINE | ID: mdl-22370452

ABSTRACT

BACKGROUND: There are many benefits to open datasets. However, privacy concerns have hampered the widespread creation of open health data. There is a dearth of documented methods and case studies for the creation of public-use health data. We describe a new methodology for creating a longitudinal public health dataset in the context of the Heritage Health Prize (HHP). The HHP is a global data mining competition to predict, by using claims data, the number of days patients will be hospitalized in a subsequent year. The winner will be the team or individual with the most accurate model past a threshold accuracy, and will receive a US $3 million cash prize. HHP began on April 4, 2011, and ends on April 3, 2013. OBJECTIVE: To de-identify the claims data used in the HHP competition and ensure that it meets the requirements in the US Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. METHODS: We defined a threshold risk consistent with the HIPAA Privacy Rule Safe Harbor standard for disclosing the competition dataset. Three plausible re-identification attacks that can be executed on these data were identified. For each attack the re-identification probability was evaluated. If it was deemed too high then a new de-identification algorithm was applied to reduce the risk to an acceptable level. We performed an actual evaluation of re-identification risk using simulated attacks and matching experiments to confirm the results of the de-identification and to test sensitivity to assumptions. The main metric used to evaluate re-identification risk was the probability that a record in the HHP data can be re-identified given an attempted attack. RESULTS: An evaluation of the de-identified dataset estimated that the probability of re-identifying an individual was .0084, below the .05 probability threshold specified for the competition. The risk was robust to violations of our initial assumptions. CONCLUSIONS: It was possible to ensure that the probability of re-identification for a large longitudinal dataset was acceptably low when it was released for a global user community in support of an analytics competition. This is an example of, and methodology for, achieving open data principles for longitudinal health data.

Subject(s)

Medical Records Systems, Computerized , Patient Identification Systems , Health Insurance Portability and Accountability Act , United States

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL