At a recent Informatics Lunch meeting, surrounded by a host of dedicated Python users I confessed to being somewhat ignorant of the business drivers behind using Python in a bioinformatics setting. Although, I have used Python on occasion, it was usually to solve a customer’s problem, and not my first choice when reaching into the toolbox. My feeling is that Python has become the de facto standard for bioinformatics scripting more as an historical fluke rather than as a result of duking it out for top-of-the-technology pile against other languages.
Oddly enough, during that conversation, a lot of the conversation floated around Python’s shortcomings, rather than its strengths. Some of these shortcomings included the Python 2.0 vs 3.0 backwards incompatibility problem, and the fact that compiling Python into native code is tricky and requires you to keep in mind platform-specific differences. Luckily, neither of these problems have been inflicted on Java or Groovy users.
Some of the typical use-cases cited for using Python, include:
It’s scriptable
Groovy has always been scriptable. The Groovy console lets you compose and execute scripts. It can also be compiled to Java byte-code and thus run on any JVM. With the release of Java 9, Java itself is also scriptable using the JShell scripting console.
I can read data files with it
The Apache POI library provides support for reading and writing Microsoft Office files like PowerPoint, Word and Excel. Here’s a simple example that shows you how to iterate through the cells in an Excel spreadsheet.
// read the Excel file
Workbook wb = WorkbookFactory.create(new File("MyExcel.xls"));
// get the first sheet
Sheet sheet = wb.getSheetAt(0);
// Decide which rows to process
int rowStart = Math.min(15, sheet.getFirstRowNum());
int rowEnd = Math.max(1400, sheet.getLastRowNum());
for (int rowNum = rowStart; rowNum < rowEnd; rowNum++) {
Row r = sheet.getRow(rowNum);
if (r == null) {
// This whole row is empty
// Handle it as needed
continue;
}
int lastColumn = Math.max(r.getLastCellNum(), MY_MINIMUM_COLUMN_COUNT);
for (int cn = 0; cn < lastColumn; cn++) {
Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL);
if (c == null) {
// The spreadsheet is empty in this cell
} else {
// Do something useful with the cell's contents
}
}
}
I can query databases with it
Groovy can make use of any JDBC data source, NoSQL database, graph database, or cloud database. Here’s a simple example that shows you how to execute a JDBC query and iterate through the results.
// create a database connection to an in-memory hsql database def db = [url:'jdbc:hsqldb:mem:testDB', user:'sa', password:'', driver:'org.hsqldb.jdbc.JDBCDriver'] def sql = Sql.newInstance(db.url, db.user, db.password, db.driver)
//query data table called 'project'
sql.eachRow('select * from PROJECT', 2, 2) { row ->
println "${row.name.padRight(10)} ($row.url)"
}
I can do text mining with it
The WEKA library is designed with text mining in mind. Since the library is Java-based, we can easily add it to a project, and execute it as a script. There are plenty of examples that show how to use WEKA here.
I can graph data with it
The Java UI library JavaFX includes a variety of graphing components including pie charts, line charts, area charts, etc. The subject is a little beyond the scope of this post, but Oracle provides some great tutorials on the subject here. And here’s an example from StackOverflow that shows how to output the chart as a PNG image without displaying it.
Conclusion
These are just some of the use-cases that were cited during the meeting. So, I’d like to throw it out there to the audience — if you have some additional use cases for Python, I’d love to hear more about them in the comments section.
