看到版 上有很多人在分享生物信息学知识,想必也有一些大牛在,那我也想来问几个问题,欢迎探讨解决。
我目前手上有4万个gene accession number 以及refseq, 如图所示,我想利用现有的两个数据名称,比如BE985939,NM_008248,转换成unigene:Mm.425 (进入NCBI,输入BE985939或NM_008248,然后搜索基因,然后点击linkout,选择unigene,既可),但我想成批量转换,手动输肯定不行。我最终目的是要寻找这些基因的注释(anotation,但我手头有一个别人编好的软件,可以通过Clone ID或unigene找到)。请教一下版上高手,如何批量转换,非常感谢。另外,还有一个关于Excle的问题,如图所示,在plateposition 这行。比如第二行,我原来的目的是输入1E12(代表第一块板的E12孔),但输入Excle后,却自动显示成100000000000(变成1乘以10 的12次方的科学记数法),我尝试转换成纯文本格式(或者前面加')但无法将100000000000转换成1E12,请教版上高手,如果批量转换这些错误,非常感谢。
并符上部分信件,是否对解决问题有帮助
Hi,
I'm working on a project called WebQTL for the University of Tennessee,
Memphis (
http://www.webqtl.org/), and we're adding information based on
the UCSC genome browser gene location data. However, what we have at the
moment is just the Genbank accession number ("BC056929," for example) and
associated data (which chromosome, location, etc.).
We would like to be able to list *only* actual (or suspected) genes, or
at least give gene names for the Genbank items that correspond to named
genes.
However, so far I have been unable to find any way to download gene names
along with location/ chromosome. If there is an easy way to either
download gene names along with accession numbers, or to download a table
that gives a common key (like the gene ID) for both gene names and
accession numbers, I would really appreciate it if someone could point it
out.
Basically, I just want to be able to take an accession number and get a
name / geneID (when applicable):
e.g. Genbank accession number BC066152 --> gene name="Rb1cc1" and
geneID=12421 (from NCBI's web site... manual entry the way I did it,
though, so not viable for a large number of hits)
Just being able to download the complete information with gene name /
geneID / location would also be very helpful.
Thanks a lot,
Alex Williams
________________________________________________________________________
Alex,
It seems there might be two levels of analysis here, if I understand
the problem. The first, I think the genome folks (I am not one) will
comment on--finding genomic location for selected known genes using the
table browser.
The second would be to use your genbank numbers to look up the unigene
associated with that genbank--this has the potential to allow you to
assign many more gene names, but the concept of a gene then becomes
unigene, rather than known gene (from UCSC). From unigene, you could
translate again to locuslink and get more information about your
"gene". I'm not sure how much of this (if any) can be done with the
table browser, but there is some information in the underlying database
tables to facilitate at least part of this analysis.
The first method will get you nicely-behaved genes, but may limit the
number of genbanks that you can look up. The second has the potential
to find many more genes, but each gene is built up as a cluster, and
that has some limitations.
----------------------------------------------------------------------------
Hi Alex,
Here is some information to supplement other responses to your question. It
looks like you have a list of mouse mRNA accession IDs from GenBank and want
to generate a corresponding list of gene names. You can retrieve this
information via the Table Browser or by downloading our data & writing a
script.
Our non-positional kgXref table cross-references a variety of IDs &
accessions, including Known Gene, mRNAs, Swiss-Prot proteins, gene symbols,
RefSeq IDs, NCBI proteins, and the gene descriptions. If you want to match
GenBank mRNA accessions with gene names, this is a good table to use. For
example, to look up the BC066152 accession that you listed below, you would:
1. Select the kgXref table from the non-positional tables list, then click
Advanced Query.
2. Type BC066152 into the mRNA text box, then click Get Results.
The output will show you the equivalent accessions/IDs, including Rb1cc1
(which is listed under the geneSymbol column). To configure the results to
show data from only selected columns, select the "Tab-separated, Choose
fields" output option and click the boxes for the field data you'd like to
display.
Similarly, you can paste a list of accessions (separated by spaces) into the
mRNA text box to generate information for multiple accessions.
Unfortunately, we do not yet provide batch query support for non-positional
tables, so you will not be able to load in a file containing a list of
accessions. However, you can get around this by downloading the kgXref table
and writing a script that retrieves the information.
You may discover that you can't find some of your accessions in the kgXref
table. We filter GenBank mRNAs based on certain criteria (see the mRNA track
description page in the Genome Browser for more details). An example of this
is the BC056929 mRNA you've listed below.
-Donna
--------------------------------------------------------------------------------
Hello, Alex.
Check out our table-browser utility.
Looks like knownGene would do it.
Also you might want refFlat.
Click on the tables link at the top menu
of the UCSC Genome Browser page.
Choose Human May2004 if prompted.
Choose knownGene (or refFlat) from the Positional tables dropdown.
Select "Position" radio-button and Enter "genome" into the text field.
Click the "Get all fields" button.
Enjoy.
-Galt
-------------------------------------------------------------------------------
Hi Alex,
To get what you desired, it would take 3 tables:
knownGene
kgXref
refLink
where knownGene is the Known Genes table, its name
field is the Known Gene ID (the mrna ID), and it has genome position
info for each gene. The kgXref table is a cross-reference table
that contains Known ID, RefSeq ID (when available), geneSymbols,
protein IDs, etc. The refLink table is another cross-reference
table for RefSeq genes. Its mrnaAcc field is the RefSeq ID, and
it has a LocusLinkId field too.
You can find detailed description of all our tables at:
http://genome.ucsc.edu/goldenPath/gbdDescriptions.html
Jim is working on our next generation Table Browser, which I believe
will enable users to do more complex SQL operations like join to get
what you want.
In the mean time, I did two quick SQL queries:
select * from mm5.kgXref, mm5.refLink, mm5.knownGene where
knownGene.name=kgXref.kgId and kgXref.refSeq = refLink.mrnaAcc;
select * from mm5.refLink, mm5.refGene where
refGene.name=refLink.mrnaAcc;
to get two tables for you. You can find them at:
http://www.soe.ucsc.edu/~fanhsu/KnownGene.txt
http://www.soe.ucsc.edu/~fanhsu/RefSeq.txt
The Known Genes table has about 29K entries and RefSeq tables has about 18K
entries.
Hope this helps.
Fan.