Articles tagged software

Things I'm Glad I Learned

Skills, concepts, techniques, and models
published: Monday, January 18, 2021

edited: January 21, 2021, 12:00

category: Misc.
tags:
software-carpentry

teaching

python

programming

open science

microbiome

rant

evolution

pipelines

software

git

containers
WARNING: This post was written with haste and therefore contains all kinds of typos, spelling errors, grammatical issues, and delusions of grandeur, wisdom, and writing ability.

This post is intended as a living document—a gratitude journal of sorts—of some things that I'm glad I learned. I expect many of the items on this list will be relevant to computation biology, but that may change in the future.

The big idea is that for every item on this list I am (A) glad that someone introduced me to it, and (B) think more people should know about it. This post is my chance to "pay it backwards", as it were; maybe someone else will be grateful for something they find for the first time on this list.

It may also double as an inspiration list for future posts.

My goal is to write a small blurb for each item …

...More
Tutorial: Reproducible data analysis pipelines using Snakemake
published: Sunday, November 19, 2017

category: Computing
tags:
teaching

programming

python

pipelines

bioinformatics

software
In many areas of natural and social science, as well as engineering, data analysis involves a series of transformations: filtering, aggregating, comparing to theoretical models, culminating in the visualization and communication of results. This process is rarely static, however, and components of the analysis pipeline are frequently subject to replacement and refinement, resulting in challenges for reproducing computational results. Describing data analysis as a directed network of transformations has proven useful for translating between human intuition and computer automation. In the past I've evangelized extensively for GNU Make, which takes advantage of this graph representation to enable incremental builds and parallelization.

Snakemake is a next-generation tool based on this concept and designed specifically for bioinformatics and other complex, computationally challenging analyses. I've started using Snakemake for my own data analysis projects, and I've found it to be a consistent improvement, enabling more complex pipelines with fewer of the "hacks" that …

...More
Tutorial: Reproducible bioinformatics pipelines using GNU Make
published: Friday, March 4, 2016

edited: November 21, 2017, 09:30

category: Computing
tags:
software-carpentry

teaching

programming

make

pipelines

bioinformatics

software
WARNING: Because of the Markdown rendering of this blog, tab characters have been replaced with 4 spaces in code blocks. For this reason, the makefile code will not work when copied directly from the post. Instead, you must first replace all 4-space indents with a tab character.

For most projects with moderate to intense data analysis you should consider using Make. Some day I'll write a post telling you why, but for now check out this post by Zachary M. Jones¹. If you're already convinced, or just want to see what it's all about, read on.

This post is the clone of a tutorial that I wrote for Titus Brown's week-long Bioinformatics Workshop at UC Davis's Bodega Marine Laboratory in February, 2016. For now, the live tutorial lives in a Github repository, although I eventually want to merge all of the good parts into the Software Carpentry Make lesson …

...More
Compiling SciPy on RHEL6
published: Monday, May 20, 2013

category: Computing
tags:
python

hpc

software

scipy

linux
Within the past two years I've discovered something interesting about myself (...actually really, really boring about myself): I can be happily entertained for hours on end setting up my computational environment just right. I find that it gives me a similar type of satisfaction to cataloguing my music collection. I guess you could call it a hobby.

Usually this entails installing the usual suspects (NumPy, Pandas, IPython, matplotlib, etc.) in a python virtual environment. When I'm particularly into it (which is always), I'll also compile the python distribution itself. I've had several opportunities to indulge this pasttime, most recently in setting up my research pipeline on the Flux high-performance compute cluster at The University of Michigan.

Installing NumPy is usually no trouble at all, but for some reason (if you know, please tell me), SciPy has always given me a "BlasNotFoundError" when installing on the Red Hat Enterprise Linux distros …

...More
PyMake I: Another GNU Make clone
published: Tuesday, May 7, 2013

edited: March 4, 2016, 10:00

category: Computing
tags:
python

software

development

make

pipelines

bioinformatics
(Edit 1): ~~This is the first of two posts about my program PyMake. I'll post the link to Part II here when I've written it.~~ While I still agree with some of the many of the views expressed in this piece, I have changed my thinking on Makefiles.

(Edit 2): ~~I'll post a new post about the topic when I take the time to write it.~~ I've written a tutorial on using _Make for reproducible data analysis_.

I am an aspiring but unskilled (not yet skilled?) computer geek. You can observe this for yourself by watching me fumble my way through vim configuration, multi-threading/processing in Python, and git merges.

Rarely do I actually feel like my products are worth sharing with the wider world. The only reason I have a GitHub account is personal convenience and absolute confidence that no one else will ever look at it besides me …

...More

Page 1 / 1