New Features

News

Improved Text Searching

07/04

Simple text searches at rcsb.org are now easier and more accurate. Text searching from the top query bar has been redesigned to be powered by the open source Apache Solr platform and are based on an indexing of PDBx/mmCIF data.

This new functionality is accessed by entering a search term or terms in the top bar of any RCSB PDB page and hitting ‘GO’. The text search supports searches for multiple words (for example, insulin receptor) as well as queries for adjacent words by enclosing the search term in double quotation marks (for example, “insulin receptor”). These two types of searches may return different results. The first search finds results where the search words appear anywhere in the entry, whereas the second search returns results where the search terms appear exactly as ordered in the query.

Search results are assigned “Match Scores” to help indicate the relevance of the result, and can be used to sort structures from “Higher to Lower” matches and vice versa. The figure below shows a search for the name Perutz.

RCSB PDB News Image

When a search term appears in one of the following categories, the corresponding PDBx/mmCIF tokens are highlighted to help users gauge their level of interest in particular entries.

  • Structure author: one of the authors of the structure (_audit_author.name).
  • Citation author: one of the authors of the citation (primary or otherwise) corresponding to the entry. An author may appear as a citation author, structure author or both for a particular entry (_citation_author.name).
  • Citation: The title of the citation (primary or otherwise) corresponding to the entry (_citation.title).
  • Entity name: The name commonly associated with the entity matching the search. An entity is a chemically distinct part of the structure entry (_entity_name_com.name).
  • Entity Description: A description of the macromolecular contents of an entity (_entity.pdbx_description).
  • Keywords: Keywords that describe the structure (_struct_keywords.text and struct_keywords.pdbx_keywords) defined by the authors of the entry and curated by the annotation staff. The struct_keywords.pdbx_keywords token is displayed as “Classification” on the corresponding Structure Summary page.
  • Structure Title: The title of the structure entry (_struct.title).

The figure below shows the results for an entry found with the search query "insulin receptor". Note the highlighting indicating the matching fields:

RCSB PDB News Image

This figure shows the results for an entry found with the search query insulin receptor (without quotes). More results are returned than in the previous example. Note the highlighted terms insulin, receptor, and insulin receptor:

RCSB PDB News Image

If a query match is found only in other tokens of a data file, results will be returned without highlighting and with the note “matching fields are not prominent.“ The figure below shows a search for the the term “model peptide”. In entry 3OTP, the term appears only in the _entity.details category in the entry’s data file.

RCSB PDB News Image


2017 New Features News Index