This document describes the Solanaceae Genomics Resource project at Michigan State University. The project aims to integrate genomic and transcriptomic data from major Solanaceae crops like potato, tomato, tobacco, and pepper. Researchers are identifying orthologs and paralogs within the Solanaceae and comparing sequences to model plants. The goal is to provide a robust comparative genomics resource to enable broad data mining and analysis of Solanaceae sequences. All project data and analysis results are made freely available on their website.
1. Solanaceae
Genomics
Resource
Bre1
R.
Whi1y,
C.
Robin
Buell,
Michigan
State
University,
Department
of
Plant
Biology,
East
Lansing
MI
48824
Collec'vely,
the
Solanaceae
(a
family
which
includes
Potato,
Tomato,
Tobacco,
Pepper,
Eggplant
and
Petunia)
are
a
valuable
component
of
U.S.
agriculture.
The
major
Solanaceae
crop
species
share
both
sequence
iden'ty
and
gene
order
thereby
providing
the
basis
for
leveraging
genomic
resources
across
taxa.
Transcriptome
and
genome
sequencing
projects
have
been
ini'ated
for
the
major
crop
species;
albeit
none
of
the
three
genome
ini'a'ves
(potato,
tomato,
tobacco)
have
yet
released
to
the
public
a
high
quality,
finished
complete
genome
sequence.
Thus,
it
is
essen'al
that
all
of
the
par'al
Solanaceae
transcript
and
genome
sequence
be
integrated
at
the
family
level
and
linked
to
other
model
dicot
species
to
provide
contextual
informa'on
on
the
puta've
func'on
of
Solanaceae
homologs.
In
this
project,
we
are
working
to
iden'fy
puta've
orthologs,
paralogs,
and
lineage-‐specific
genes
within
the
Solanaceae
to
facilitate
intra-‐
and
inter-‐species
comparisons.
We
also
iden'fy
homologs
of
Solanaceae
species
within
three
dicot
species
(Arabidopsis,
Poplar,
Grapevine)
to
permit
leveraging
resources
from
these
model
species
to
the
Solanaceae.
We
are
working
to
generate
compara've
analyses,
alignments,
views,
and
displays
of
the
Solanaceae.
Overall,
we
provide
a
robust
and
integrated
compara've
genomics
resource
that
permits
broad
and
deep
data-‐mining
of
Solanaceae
sequences
by
the
community.
This
project
was
ini'ated
January
1,
2008
and
we
con'nue
to
update
project
data
quarterly,
and
develop
addi'onal
resources
and
tools
for
the
Solanaceae
community.
It
is
supported
by
the
Na'onal
Research
Ini'a've
(NRI)
Plant
Genome
Program
of
the
USDA
Na'onal
Ins'tute
of
Food
and
Agriculture
(NIFA)
(2008-‐35300-‐18671).
All
project
data
is
made
available
through
our
web
site:
hp://solanaceae.plantbiology.msu.edu
Project
email
address:
sgr@plantbiology.msu.edu
Model
Dicot
ComparaIve
Genome
Databases
Alignments
of
Solanaceae
transcript
assemblies
against
model
dicot
(Arabidopsis,
Grapevine,
Poplar)
genomic
and
polypep'de
sequence
are
available
for
display
and
search
in
a
Gbrowse
database.
Potato,
Tomato
and
Tobacco
DraK
Genomes
Our
analyses
and
databases
include
all
public
data
releases
from
the
three
genome
sequencing
efforts
in
the
Solanaceae.
We
obtain
data
as
it
is
released
to
GenBank
from
the
Interna'onal
Tomato
Genome
Sequencing
Project
which
includes
gene
models
annotated
by
the
project
members,
and
the
Interna'onal
Potato
Genome
Sequencing
Consor'um
(of
which
we
are
members)
which
to
date
has
released
assemblies
with
no
gene
annota'ons.
Our
annota'on
and
analysis
pipeline
provides
gene
models
for
genes
present
on
these
assemblies,
supplementary
to
any
previously
annotated
gene
models
present
in
the
public
data.
A
Community
Resource
As
a
component
of
our
project
we
aim
to
provide
a
web
portal
that,
in
addi'on
to
presen'ng
results
from
our
compara've
analyses,
acts
as
a
unified
repository
for
genomic
and
transcriptomic
data,
and
related
bioinforma'c
resources
for
the
Solanaceae,
and
thereby
improves
the
accessibility
of
this
data
to
the
Solanaceae
community.
AnnotaIon/Analysis
Pipelines
We
retrieve
all
publicly
available
Solanaceae
genomic
sequences
from
GenBank,
and
the
sequences
are
run
through
the
GMOD
MAKER
gene
annota'on
pipeline
to
provide
a
common
set
of
evidence-‐supported
gene
model
predic'ons;
these
supplement
the
models
previously
annotated
(if
any)
on
the
public
assemblies.
Our
transcriptomic
analyses
are
performed
on
transcript
assemblies
generated
by
PlantGDB
(PUTs).
Some
of
the
analyses
we
perform
on
genomic
and
transcriptomic
sequence
include:
•
Ortholog/paralog
predic'on
by
best
hit
and
OrthoMCL
clustering
•
SSR
iden'fica'on
in
transcript
and
genome
sequence,
and
genera'on
of
primers
(using
Primer3)
•
Iden'fica'on
of
puta've
SNPs
in
transcript
assemblies
•
Alignment
of
PlantGDB-‐assembled
Solanaceae
transcripts
(PUTs)
to
the
genomic
sequence
using
exonerate
•
Alignment
of
UniProt's
SwissProt
&
UniRef
protein
databases
to
the
genomic
sequence
using
exonerate
•
BLASTP
of
Solanaceae
gene
models
against
model
dicot
proteomes
(Arabidopsis,
Grapevine,
Poplar)
•
InterProScan
search
on
the
models
to
iden'fy
func'onal
domains
•
Repeat
feature
predic'on
(using
RepeatMasker)
•
ncRNA
feature
predic'on
(using
tRNAscan-‐SE
and
RNAmmer)
Integrated
and
Accessible
Data
Available
sequence
data,
analysis
results,
and
tools
for
species
in
the
Solanaceae
are
presented
in
centralized
views
on
the
project
site
to
aid
users
in
applying
these
resources
in
their
research.
At
the
genome
level,
our
species
overview
page
consolidates
available
sequence
data,
genome
informa'on
and
resources,
and
lists
available
analysis
results
and
tools.
At
the
transcript
level,
our
gene
overview
page
presents
a
summary
of
gene
informa'on
and
analyses,
such
as
BLAST
results,
computa'onally
predicted
SNPs,
SSRs,
orthology/paralogy,
and
links
transcripts
to
other
site
resources
including
our
genome
browsers.
Solanaceae
ComparaIve
Genome
Database
Our
database
contains
annota'on
and
compara've
data
for
all
public
Solanaceae
genomic
sequence
assemblies.
We
currently
use
the
GMOD
Generic
Genome
Browser
(Gbrowse)
to
facilitate
the
web-‐based
display
and
searching
of
our
annota'on
and
compara've
analyses.
Potato
Genome
Sequencing
ConsorIum
Potato
DraK
Genome
Browser
As
members
of
the
Potato
Genome
Sequencing
Consor'um
we
are
hos'ng
the
public
Potato
genome
browser.
Presently,
the
doubled
monoploid
Solanum
phureja
DM1-‐3
516R44
(CIP801092)
v3.2
genome
assembly
and
annota'on
is
online.
Visit
hp://potatogenome.net
for
details
on
this
draj
genome
release.
In
the
genome
browser
all
aligned
Solanaceae
transcript
assemblies
are
linked
to
the
the
full
set
of
resources
associated
with
those
assemblies
provided
by
the
Solanaceae
Genomics
Resource
site.
Upcoming
Features
for
2010/2011
We
expect
that
the
finished
Potato
and
Tomato
genomes
will
be
released
to
the
public
sequence
databases
in
the
coming
months.
At
that
'me,
we
will
integrate
the
complete
genomes
into
our
exis'ng
resources,
and
will
make
available
addi'onal
tools
and
analysis
results;
one
of
these
new
tools
will
be
a
genome
synteny
viewer.
We
have
produced
a
significant
amount
of
RNA-‐Seq
data
from
our
par'cipa'on
in
the
Potato
Genome
Sequencing
Consor'um
(PGSC)
hp://potatogenome.net
and
Solanaceae
Coordinated
Agricultural
Project
(SolCAP)
hp://solcap.msu.edu,
and
when
publicly
released
it
will
be
incorporated
into
the
Solanaceae
Genomics
Resource
databases
and
tools.
This
data
will
greatly
expand
our
exis'ng
SNP
database
tool,
and
we
will
provide
new
tools
for
the
query
and
display
of
expression
data.
Coming Soon