| www.hannohinsch.com | software design + construction | ||||
| JavaGene | |||||
|
JavaGene is a set of Java foundation classes for genomic sequence analysis that you can learn to use in an afternoon. It's particularly good at doing low-level sequence manipulations, such as one might need when writing gene finders, analyzing gene structure, modeling molecular evolution and things of that sort. It offers little to those who are constructing annotation pipelines or integrating disparate data sources. JavaGene completely encapsulates tedious sequence arithmetic, eliminating pesky off-by-one bugs and edge condition error checking, and generally makes working with stranded sequences a breeze. If you like working in a strongly object-oriented style, and your data is in Fasta and GFF formats, you should check it out. I originally wrote this for my own use while studying bioinformatics at the University of Pennsylvania, and used it there for three good-sized research projects. The version here is a refactored subset. I'm posting it on the web in the hope that it might prove useful to fellow bioinformatics geeks. Overview▪ Simple! Documented! Includes sample programs! ▪ Encapsulates common sequence data types and operations, allowing object-oriented programming style. ▪ Implements a rich set of methods to manipulate locations on a sequence, such as "sliding window" iterators, prefix( ), suffix( ), contains(), distance( ), overlaps(), upstream(), union(), and many more like it. ▪ Transparently handles operations on forward and reverse strands. ▪ Understands Fasta format sequence files. Seamlessly supports both typical in-memory sequences and oversized gigabyte-sized sequences (using memory-mapped io). ▪ Understands various flavors of GFF/GTF feature files. Supports selection of features such as genes, exons, etc based on attributes. Can splice a sequence based on a list of features. ▪ AminoAcid utilities: Translate nucleotides to amino acids. Check for synonyms. Find Blosum62 distance. ▪ Nucleotide utilities: Complementation for both DNA and RNA sequences. Check for matches using the IUPAC "ambiguous" symbols, such as R,Y,A, etc. Check it out1. Review the annotated sample programs: ▪ Essentials (getting started) ▪ GenesToProteins (a small but real program) ▪ Strands (working with stranded sequences) ▪ Bio (nucleotide and protein tools) 2. Review the online JavaDoc. 3. Download the latest release file, which includes a jar file, complete source, the sample programs, and the JavaDoc. Downloads
FAQWhy?OK, I acknowledge the value of cars, trucks and airplanes, but sometimes I just want to ride my bicycle. JavaGene is a bicycle; its features were selected with an eye toward creating the smallest set that could arguably be called useful. Its domain is limited, but within that domain it works very well. Compared to what?Check out BioPerl, BioJava, BioPython and BioRuby, all at OpenBio.org. Decide for yourself. Requirements?JavaGene requires the J2SE 5.0 JDK, as it makes use of generic types and other 5.0 features. The javagene.jar file needs to be placed in your classpath. You will probably want to use the "-Xmx" switch on the JVM to increase the maximum RAM accessible to Java (I use -Xmx700m on my 512mb laptop). I also suggest enabling assertions (the "-ea" switch). Bugs?Well, yes, this is software, so I'm sure there are, despite my best efforts. Please email me promptly at hhinsch (at) yahoo (dot) com if you find any. Future?Gapped sequence and multiple alignment classes are coming as soon as I can finish documenting and testing them. Data?I am a huge fan of the UCSC Genome Browser site. You can download your favorite genome in Fasta format, and nearly everything you see in the browser as feature files in the GTF format (look for and use the "Table Browser"). Data in these formats works well with JavaGene. |
Home▪ JavaGene ▪OverviewCheck it outDownloadsFAQWhy?Compared to what?Requirements?Bugs?Future?Data? |
||||
| Copyright © 2005-2006 Hanno Hinsch | |||||