Non-parametric clustering of genomics data

Next generation sequencing [1] is causing an explosion in the amount of data generated by the field of genomics. The field is ripe for analysis with the tools of Big Data and Data Science.

For dealing with data in my current field of biophysics, I've recently developed an obsession with the subset of unsupervised learning known as clustering. Specifically, I'm interested in clustering techniques that either don't require arbitrary parameter selection or can be adapted to eliminate the need for a human to make a choice. Rather than having a user select parameters for analyzing the data, I want the data to speak for itself.

Feelings by Rodin

Last week I visited the National Gallery in Washington, D.C. Among the pieces, there were several Rodin sculptures. While there's nothing remarkable about a gallery having such sculptures, seeing them gave me pause. I've seen Rodin's sculptures in Palo Alto, D.C., London and Paris, and there's never just one or two. Rodin was prolific. He was constantly cranking out sculptures. As I reflected on how many Rodin pieces, I had a thought that many grad students in their late twenties like me have had. What am I doing with my life?

