Genomics Workspace¶
Genomics workspace is a open-source project created by i5k workspace of NAL.
In this project, we produced a Django website with functionality of common sequence searchs including BLAST, HMMER, and Clustal.
Leveraging the admin page of Django and task queue by RabbitMQ and Celery, it’s much easier to manage the sequence databases and provide services to end-users.
All source codes of genomics workspace are in our github repo.
Note
You can try genomics workspace on our live services:
- BLAST: https://i5k.nal.usda.gov/webapp/blast/
- HMMER: https://i5k.nal.usda.gov/webapp/hmmer/
- Clustal: https://i5k.nal.usda.gov/webapp/clustal/
In fact, the live services listed above are implemented by a customized version of genomics workspace. You can check the source code of it in another github repo: NAL-genomics-workspace.
Pre-requisites¶
- git
- Python 2.7
- npm
- RabbitMQ
- PostgreSQL
- mod_wsgi (optional, only for production)
Setup Guide¶
This is our introduction to this project.
Setup Guide (CentOS)¶
This setup guide is for CentOS. It’s tested in CentOS 6.7 and CentOS 7.2, but it should also work on all modern linux distributions.
Note: The following variables may be used in path names; substitute as appropriate:
<user> : the name of the user doing a set up.
<user-home> : the user's home directory, e.g., /home/<user>
<git-home> : the directory containing the genomics-workspace, and `.git/` folder for `git` will be there.
Project Applications¶
Clone or refresh the genomics-workspace:
git clone https://github.com/NAL-i5K/genomics-workspace
# Or if the repository exists:
cd <git-home>
git fetch
Python¶
Install necessary packages:
sudo yum -y groupinstall "Development tools"
sudo yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel
sudo yum -y install readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel python-devel
Install python 2.7.13 from source:
cd <user-home>
wget http://www.python.org/ftp/python/2.7.13/Python-2.7.13.tar.xz
tar -xf Python-2.7.13.tar.xz
# Configure as a shared library:
cd Python-2.7.13
./configure --prefix=/usr/local --enable-unicode=ucs4 --enable-shared LDFLAGS="-Wl,-rpath /usr/local/lib"
# Compile and install:
make
sudo make altinstall
# Update PATH:
export PATH="/usr/local/bin:$PATH"
# Checking Python version (output should be: Python 2.7.13):
python2.7 -V
# Cleanup if desired:
cd ..
rm -rf Python-2.7.13.tar.xz Python-2.7.13
Install pip and virtualenv:
wget https://bootstrap.pypa.io/ez_setup.py
sudo /usr/local/bin/python2.7 ez_setup.py
wget https://bootstrap.pypa.io/get-pip.py
sudo /usr/local/bin/python2.7 get-pip.py
sudo /usr/local/bin/pip2.7 install virtualenv
Build a separate virtualenv:
cd <git-home>
# Create a virtual environment called py2.7 and activate:
virtualenv -p python2.7 py2.7
source py2.7/bin/activate
RabbitMQ¶
Install RabbitMQ Server:
cd <user-home>
# Install RHEL/CentOS 6.8 64-Bit Extra Packages for Enterprise Linux (Epel).
# The 6.8 Epel caters for CentOS 6.*:
wget https://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
sudo rpm -ivh epel-release-6-8.noarch.rpm
# For RHEL/CentOS 7.* :
# wegt http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-10.noarch.rpm
# and change other commands accordingly
# Install Erlang:
sudo yum -y install erlang
# Install RabbitMQ server:
sudo yum -y install rabbitmq-server
# To start the daemon by default when system boots run:
sudo chkconfig rabbitmq-server on
# Start the server:
sudo /sbin/service rabbitmq-server start
# Clean up:
rm epel-release-6-8.noarch.rpm
Memcached¶
Install and activate memcached:
sudo yum -y install memcached
# Set to start at boot time:
sudo chkconfig memcached on
Database¶
Install PostgreSQL:
# Add line to yum repository:
echo 'exclude=postgresql*' | sudo tee -a /etc/yum.repos.d/CentOS-Base.repo
# Install the PostgreSQL Global Development Group (PGDG) RPM file:
sudo yum -y install http://yum.postgresql.org/9.5/redhat/rhel-6-x86_64/pgdg-centos95-9.5-2.noarch.rpm
# Install PostgreSQL 9.5:
sudo yum -y install postgresql95-server postgresql95-contrib postgresql95-devel
# Initialize (uses default data directory: /var/lib/pgsql):
sudo service postgresql-9.5 initdb
# Startup at boot:
sudo chkconfig postgresql-9.5 on
# Control:
# sudo service postgresql-9.5 <command>
#
# where <command> can be:
#
# start : start the database.
# stop : stop the database.
# restart : stop/start the database; used to read changes to core configuration files.
# reload : reload pg_hba.conf file while keeping database running.
# Start:
sudo service postgresql-9.5 start
#
# (To remove everything: sudo yum erase postgresql95*)
#
# Create django database and user:
sudo su - postgres
psql
# At the prompt 'postgres=#' enter:
create database django;
create user django;
grant all on database django to django;
ALTER USER django CREATEDB;
# Connect to django database:
\c django
# Create extension hstore:
create extension hstore;
# Exit psql and postgres user:
\q
exit
# Config in pg_hba.conf:
cd <git-home>
export PATH=/usr/pgsql-9.5/bin:$PATH
# Restart:
sudo service postgresql-9.5 restart
Python Modules and Packages¶
Install additional Python packages:
cd <git-home>
pip install -r requirements.txt
Chrome Driver¶
- Install ChromeDriver from https://sites.google.com/a/chromium.org/chromedriver/downloads
- Add to PATH
Celery¶
Configure celery:
# Run celery manually
celery -A i5k worker --loglevel=info --concurrency=3
# Run celery beat maually as well
celery -A i5k beat --loglevel=info
Install Binary Files and Front-end Scripts¶
This step will instll binary files (for BLAST, HMMER and Clustal) and front-end scripts (.js, .css files):
npm run build
Setup Guide (MacOS)¶
This setup guide is tested in MacOS Sierra (10.12) and MacOS High Sierra (10.13), but it should also work on all recent MacOS versions.
Note: The following variables may be used in path names; substitute as appropriate:
<user> : the name of the user doing a set up.
<user-home> : the user's home directory, e.g., /home/<user>
<git-home> : the directory containing the genomics-workspace, and `.git/` folder for `git` will be there.
Project Applications¶
Clone or refresh the genomics-workspace:
git clone https://github.com/NAL-i5K/genomics-workspace
# Or if the repository exists:
cd <git-home>
git fetch
Homebrew¶
We recommend to use Homebrew as package manager. Installation steps can be found at https://brew.sh/.
Python¶
Install virtualenv:
pip install virtualenv
Build a separate virtualenv:
# Make root dir for virtualenv and cd into it:
cd genomics-workspace
# Create a virtual environment called py2.7 and activate:
virtualenv -p python2.7 py2.7
source py2.7/bin/activate
RabbitMQ¶
Install and run RabbitMQ Server:
brew install rabbitmq
# Make sure /usr/local/sbin is in your $PATH
rabbitmq-server
Database¶
Install PostgreSQL:
brew install postgres
psql postgres
# At the prompt 'postgres=#' enter:
create database django;
create user django;
grant all on database django to django;
ALTER USER django CREATEDB;
# Connect to django database:
\c django
# Create extension hstore:
create extension hstore;
# Exit psql and postgres user:
\q
exit
Python Modules and Packages¶
Install additional Python packages:
cd <git-home>
pip install -r requirements.txt
Chrome Driver¶
- Install ChromeDriver from https://sites.google.com/a/chromium.org/chromedriver/downloads
- Add to PATH
Celery¶
Configure celery:
# Run celery manually
celery -A i5k worker --loglevel=info --concurrency=3
# Run celery beat maually as well
celery -A i5k beat --loglevel=info
Install Binary Files and Front-end Scripts¶
This step will instll binary files (for BLAST, HMMER and Clustal) and front-end scripts (.js, .css files):
npm run build
Advanced Setup¶
JBrowse/Apollo Linkout Integration¶
In Genomics workspace, we have a linkout integration between BLAST and JBrowse/Apollo.
You can directly go to corresponding sequence location through clicking entries in BLAST result table.
To start using it, make change of ENABLE_JBROWSE_INTEGRATION
in i5k/settings.py
;
ENABLE_JBROWSE_INTEGRATION = True
User Guide¶
BLAST, HMMER, and Clustal are the main functions of the genomics-workspace. Each of these functions is implemented as a single app under Django.
In this section, we will go through details about how to configure each application.
In short, you need to configure database for BLAST and HMMER, but you don’t need to configure anything for Clustal.
Note
The page is for users that want to set up genomics-workspace by creating new admin user and confuguring in admin page. If you want to know how to use services provided by genomics-workspace, see these tutorials:
Getting started¶
To get started, set up the following:
- setup an admin account
- Use
python manage.py createsuperuser
. - Follow the instructions shown on your terminal, then browse and login to the admin page of genomics-workspace. Usually, the admin page should be at
http://127.0.0.1:8000/admin/
. - If you already have an admin account, use
python manage.py runserver
and then browse and login to genomics-workspace.
- Use
- Create these directories if you don’t have them
- media/blast/db
- media/hmmer/db
- Create sequence sequence types in blast/sequence-type. We recommend creating these three:
- Peptide/Protein
- Nucleotide/Genome Assembly
- Nucleotide/Transcript
- Copy all fasta files to be formatted for blast to media/blast/db
- Copy protein fasta files to be formatted for hmmer to media/hmmer/db
BLAST Database Configuration¶
Manually creating a BLAST database¶
- Add Organism (click the Organism icon at sidebar and click Add organism):
- Display name should be scientific name.
- Short name are used by system as a abbreviation.
- Descriptions and NCBI taxa ID are automatically filled.

- Add Sequence
- Add BLAST DB
- Choose
Organsim
- Choose
Type
(Sequence type) - Type location of fasta file in
FASTA file path
(It should be in<git-home>/media/blast/db/
) - Type
Title
name. (showed in HMMER page) - Type
Descriptions
. - Check
is shown
, if not check, this database would show in HMMER page. - Save
- Choose

- Browse to
http://127.0.0.1:8000/blast/
, you should able to see the page with dataset shown there.
Creating a BLAST database via command line¶
An admin user can add or remove data from the genomics-workspace database via the command line interface. Here, we describe how to use commands to interact with the database.
Note
the order of steps is important. Try to do these steps in order.
- To add organism
python manage.py addorganism [genus] [species]
(e.g python manage.py addorganism Apis mellifera)
- To add a fasta file to the Blast application
python manage.py addblast [genus] [species] -t [type] -f [path of fasta file] -d [description]
(e.g python manage.py addblast Apis mellifera -t nucleotide Genome Assembly -f media/blast/db/GCF_003254395.2_Amel_HAv3.1_genomic.fna -d Apis mellifera genome assembly, Amel_HAv3.1)- [type] here should be one of the sequence types you set up earlier, e.g. “peptide Protein”, “nucleotide Genome Assembly” or “nucleotide Transcript”
- [description] will be the Fasta file description in the web interface. If this argument is omitted, the program will use the Fasta file name. Example descriptions are “[genus] [species] genome assembly, [assembly name]”, “[genus] [species] [annotation name], peptides”, “[genus] [species] [annotation name], transcripts” or “[genus] [species] [annotation name], CDS”
- To make the blast database (via makeblastdb)
python manage.py blast_utility [path of fasta file] -m
(e.g python manage.py blast_utility media/blast/db/GCF_003254395.2_Amel_HAv3.1_genomic.fna -m)
- To populate the genomics-workspace sequences table
python manage.py blast_utility [path of fasta file] -p
(e.g python manage.py blast_utility media/blast/db/GCF_003254395.2_Amel_HAv3.1_genomic.fna -p)
- To show the blast database in the web interface (the blast database will not show by default)
python manage.py blast_shown [path of fasta file] -shown ‘true’
(e.g python manage.py blast_shown media/blast/db/GCF_003254395.2_Amel_HAv3.1_genomic.fna -shown ‘true’)
HMMER Database Configuration¶
Like BLAST, HMMER databases must be configured then they could be searched.
Go to the django admin page and click Hmmer on the left menu bar. You need to create a HMMER db instance (Hmmer dbs) for each fasta file.
Manually creating a HMMER database¶
- Choose
Organsim
- Type location of peptide fasta file in
FASTA file path
- Type
Title
name. (showed in HMMER page) - Type
Descriptions
. - Check
is shown
, if not check, this database would show in HMMER page. - Save

Creating a HMMER database via command line¶
An admin user can add or remove data from the genomics-workspace database via the command line interface. Here, we describe how to use commands to interact with the database.
- To add organism (not necessary if the organism is already added)
python manage.py addorganism [genus] [species]
(e.g python manage.py addorganism Apis mellifera)
- To add hmmer
python manage.py addhmmer [genus] [species] -f [path of fasta file] -d [genus] [species] [annotation name], [sequence type]
(e.g python manage.py addhmmer Apis mellifera -f media/blast/db/GCF_003254395.2_Amel_HAv3.1_genomic.fna -d Apis mellifera Apis_mellifera_Annotation_Release_103, peptides)- [description] will be the Fasta file description in the web interface. If this argument is omitted, the program will use the Fasta file name. Example description: “[genus][ species] [annotation name], peptides”
Organism and Database deletion¶
Organism, BLAST and HMMER databases can be deleted after configuration via the command line interface. Here, we describe the commands for deleting them.
- To delete organism
python manage.py delete -o [genus] [species]
(e.g python manage.py delete -o Apis mellifera)
- To delete BLAST database
python manage.py delete -b [path of fasta file]
(e.g python manage.py delete -b media/blast/db/GCF_003254395.2_Amel_HAv3.1_genomic.fna)
- To delete HMMER database
python manage.py delete -h [path of fasta file]
(e.g python manage.py delete -h media/blast/db/GCF_003254395.2_Amel_HAv3.1_genomic.fna)
How to Deploy¶
In short, you need to setup following tools and services:
- Apache HTTP server
- mod_wsgi
- RabbitMQ
- Celery and celerybeat runs in daemon mode.
Because genomics workspace is a standard Django website, there is no large difference to deploy genomics workspace. We recommed to deploy genomics workspace through Apache and mod_wsgi.
You may want take a look the great documentation of Django project on deploying as well.
Apache HTTP server and mod_wsgi¶
See the document of Django. You can also see the example settings file of Apache and mod_wsgi in our github repo.
RabbitMQ¶
Use the rabbitmq-server command.
Celery and celerybeat¶
Here are example setup steps for linux,
Copy files:
# when using CentOS 7.* # copy celeryd.sysconfig and celerybeat.sysconfig to /etc/default instead. sudo cp celeryd /etc/init.d sudo cp celerybeat /etc/init.d sudo cp celeryd.sysconfig /etc/sysconfig/celeryd sudo cp celerybeat.sysconfig /etc/sysconfig/celerybeat
edit ‘/etc/sysconfig/celeryd’:
CELERYD_CHDIR="<git-home>" CELERYD_MULTI="<git-home>/py2.7/bin/celery multi"
edit ‘/etc/sysconfig/celerybeat’ as follows:
CELERYBEAT_CHDIR="<git-home>" CELERY_BIN="<git-home>/py2.7/bin/celery"
set as daemon:
sudo chkconfig celeryd on sudo chkconfig celerybeat on
For more details or setup on Mac, check the document of Celery. Example files mentioned above are also (celery*) in our github repo.
Trouble Shooting¶
Q: I get an error message like: FATAL: Ident authentication failed
. How can I fix this ?
A: It’s because the setting of PostgreSQL database.
Try to modify the config file pg_hba.conf
.
For example, in PostgreSQL 9.5, the file is at /var/lib/pgsql/9.5/data/pg_hba.conf
.
Make sure you change part of the content of it into something like:
local all all peer
host all all 127.0.0.1/32 ident
host all all ::1/128 md5
About i5k Workplace at NAL¶
The i5k Workspace at NAL is a platform for communities around ‘orphaned’ arthropod genome projects to access, visualize, curate and disseminate their data.
For more information, please see website of i5k Workspace@NAL.