RefSeq release 85 is now public

RefSeq release 85 is now accessible online, via FTP and through NCBI’s programming utilities. This full release incorporates genomic, transcript, and protein data available, as of November 6, 2017, and contains 146,710,309 records, including 100,043,962 proteins, 20,905,608 RNAs, and sequences from 73,996 organisms. The release is provided in several directories as a complete dataset and as divided by logical groupings. See the RefSeq release notes for more information.

Continue reading

November 8 NCBI Minute: New API keys for better E-utilities & EDirect access to NCBI data

On Wednesday, November 8, 2017, we will present a webinar on API keys for E-utilities. In this webinar, you’ll learn how to get and start using your API key with the E-utilities and the command line EDirect programs.

Date and time: Wednesday, November 8, 2017 12:00-12:30PM EST

After registering, you will receive a confirmation email with information about attending the webinar. After the live presentation, the webinar will be uploaded to the NCBI YouTube channel. You can learn about future webinars on the Webinars and Courses page.

New API Keys for the E-utilities

If you regularly use the E-utilities API, we have important news for you: NCBI is now providing API keys for the E-utilities! After May 1, 2018, NCBI will limit your access to the E-utilities unless you have one of these keys. Obtaining an API key is quick, and simple, and will allow you to access NCBI data faster. If you don’t have an API key, E-utilities will still work, but you may be limited to fewer requests than allowed with an API key.

What is an API key?

An API key is a unique string that you include in your HTTP requests that identifies you to NCBI servers. Think of the API key as a ‘turbocharger’ that lets you get more data, faster, from NCBI.

Continue reading

Variation feature changes in NCBI Reference Sequences coming in 2018

Starting in March 2018, SNP variation features will no longer be in RefSeq genome assembly records – chromosome and contig records with NC_, NT_, NW_ and AC_ accession prefixes. This change affects both the ASN.1 and flatfile records. Because the number of variants is already enormous and still growing, removing SNP features from these large genomic records will significantly reduce the size of RefSeq FTP files and make downloading and processing easier. We will continue to include SNPs on NG_-prefixed genomic records, and transcript (NM_, NR_, XM_, XR_) and protein (NP_, XP_, YP_) sequences.

Reminder: As of September 2017, NCBI has stopped accepting submissions for non-human SNPs in dbSNP and dbVar. RefSeq flatfiles will stop presenting non-human variant data in November 2017.

Subscribe to the refseq-announce listserv for regular updates on RefSeq.

BLAST+ 2.7.1 now available

In the new version (2.7.1) of the BLAST+ executables, blastdbcmd can look up taxonomic names (e.g., scientific or common name) faster. We have also made some low-level improvement that allow BLAST to multithread more efficiently, especially when available memory is not sufficient for the database.

Note: Some LINUX and MacOSX users may find that they need to increase the number of open file descriptors allowed for a process. The number of allowed open file descriptors can be easily changed with “ulimit -n” (under bash). We suggest setting the limit to at least 1024.

See the BLAST+ release notes for more information.