Scholar Request

Request Structure

The request fields are used in queries and sort operations. The request payload should comply with following json schema.

Fields	Description	Required
query	Valid json search request	true
sort	Use available fields to sort results by ascending/descending order.	false
include	Only get specific fields from API response. By default all fields are selected.	false
exclude	Get all fields except undesired ones in search result.	false
size	Integer value to specify number of items per page	false
from	Integer value, defines the offset from the first result	false
scroll_id	Pagination parameter	false (true for next scroll requests)
scroll	Lifespan of Scroll scroll context in minute (e.g. 1m)	false (true for scroll context)
stemming	Change the ability to reduce the search word into root form	false (true by default)
regex	For Query String based queries containing regular expressions	false (false by default)
min_score	For limiting the response to the most relevant results, e.g. `"min_score": 14`	false

Searchable Fields

For searching, the following fields are supported in the API:

Group	Field	Type	Description
General	lens_id	String	Unique lens identifier e.g. `100-004-910-081-14X`
General	title	String	Title of the scholarly work e.g. `Malaria`
General	abstract	String	Scholarly work abstract text
General	full_text	String	Full Text
General	date_published	Date	Date of publication e.g. `2009-05-22`
General	year_published	Integer	Year of publication e.g. `1986`
General	created	Date	Record created date e.g. `2018-05-12`, `2016-08-01T00:00:00+00:00`
General	publication_type	String	Publication Type `conference proceedings`, `book chapter`, `journal article`, `component`, `conference proceedings article`, `dataset`, `libguide`, `reference entry`, `book`. N.B. this field is case sensitive.
General	publication_supplementary_type	String	Supplementary publication type e.g. `review`, `comparative study`, `research support`. N.B. this field is case sensitive.
General	external_id_type	String	External Identifier type (Crossref: `doi`, Microsoft Academic: `magid`, PubMed: `pmid`, PubMed Central: `pmcid`, CORE: `coreid`, OpenAlex: `openalex`)
General	retraction_update.date	Date	The date of the retraction update. Data source: Crossref/Retraction Watch. e.g. `2018-05-12`
General	retraction_update.nature	String	The nature of the retraction update (e.g. Retraction, Expression of Concern, Correction, Reinstatement). Data source: Crossref/Retraction Watch. e.g. `Retraction`
General	retraction_update.reason	String	The reason for the retraction update (e.g. investigation by journal/publisher, notice - limited or no information, concerns/issues about data, unreliable results, investigation by third party, etc.). Data source: Crossref/Retraction Watch. e.g. `Falsification/Fabrication of Image`
Authors	author.display_name	String	Author’s full name e.g. `Alexander Kupco`
Authors	author.orcid	String	Author ORCID identifier e.g. `0000-0001-5352-4498`
Authors	author.magid	String	Author MAG identifier
Authors	author.first_name	String	The author’s first name e.g. `Alexander`
Authors	author.last_name	String	The author’s last name e.g. `Kupco`
Authors	author_count	Integer	Number of Authors
Authors	author.affiliation.name	String	The institution associated with the author affiliations. e.g. `Stony Brook`
Citations	reference.lens_id	String	The Lens ID of scholarly works cited in the reference list e.g. `007-899-176-416-740`
Citations	referenced_by_count	Integer	The number of scholarly works that cite this scholarly work
Citations	reference_count	Integer	The number of works in the reference list of a scholarly work
Citations	patent_citation.lens_id	String	ID of Referenced by patents. N.B this field will be deprecated in future, we recommend using the `referenced_by_patent.lens_id` field instead.
Citations	patent_citation_count	Integer	Number of patent citations. N.B this field will be deprecated in future, we recommend using the `referenced_by_patent_count` field instead.
External Identifiers	ids.doi	String	Crossref DOI Identifier
External Identifiers	ids.pmid	String	PubMed ID Identifier
External Identifiers	ids.pmcid	String	PubMed Central ID Identifier
External Identifiers	ids.magid	String	Microsoft Academic ID
External Identifiers	ids.coreid	String	CORE Identifier
External Identifiers	ids.openalex	String	OpenAlex Identifier
Source	source.title	String	The name of source publication in which the scholarly work appears e.g. Journal name, Book title, Conference proceedings
Source	source.title.exact	String	The full name of source publication for exact match. N.B. this field is case sensitive.
Source	source.publisher	String	The publisher of the source publication `Elsevier`, `Wiley`, `American Medical Association`
Source	source.country	String	The publisher’s country e.g. `United States`, `United Kingdom`. N.B. this field is case sensitive.
Source	source.asjc_code	String	The All Science Journal Classification (ASJC) code e.g. `2735`
Source	source.issn	String	The International Standard Serial Number of the source publication, without hyphenation e.g. `00222836`, `1474547x`. N.B. this field is case sensitive.
Subject Matter	field_of_study	String	Fields Of Study e.g. `Immunology`, `Malaria`
Subject Matter	source.asjc_subject	String	Subject is derived from journals descriptions in Crossref metadata based on the Science Journal Classification Codes e.g. `Pediatrics`, `Microbiology`, `Biophysics`
Subject Matter	keyword	String	Keywords for the scholarly work from PubMed. N.B. this field is case sensitive.
Subject Matter	chemical.mesh_ui	String	Chemical MeSH term unique identifier e.g. `D000293`. N.B. this field is case sensitive.
Subject Matter	chemical.registry_number	String	Chemical registration number e.g. `5Q7ZVV76EI`
Subject Matter	chemical.substance_name	String	Substance name e.g. `Antimalarials`
Subject Matter	mesh_term.mesh_heading	String	MeSH terms are the National Library of Medicine’s controlled vocabulary or medical subject headings assigned to PubMed entries. e.g. `Phosphates`, `Immunochemistry`. N.B. this field is case sensitive.
Subject Matter	mesh_term.mesh_ui	String	MeSH term unique identifier. MeSH terms are the National Library of Medicine’s controlled vocabulary or subject heading list. e.g. `D000293`. N.B. this field is case sensitive.
Institutions	author.affiliation.name.exact	String	Exactly matches the full institution name, e.g. `Stony Brook University`. N.B. this field is case sensitive.
Institutions	author.affiliation.name_original	String	The author’s original affiliation including the institution name and address. e.g. `School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts USA`
Institutions	author.affiliation.ror_id	String	The institution ROR identifier e.g. `03yrm5c26`. N.B. this field is case sensitive.
Institutions	author.affiliation.ror_id_lineage	String	The institution’s ROR identifier lineage. This includes all parent ROR identifiers for the institution. e.g. `00pjdza24`, `03yrm5c26`. N.B. this field is case sensitive.
Institutions	author.affiliation.address.city	String	The institution city e.g. `Tokyo`. N.B. this field is case sensitive.
Institutions	author.affiliation.address.state_code	String	The institution state e.g. `US-NY`. N.B. this field is case sensitive.
Institutions	author.affiliation.address.country_code	String	The alpha-2 country code of the institution e.g. `US`,`DE`,`CH`,`FR`, etc. N.B. this field is case sensitive.
Institutions	author.affiliation.type	String	The institution type e.g. `Government`, `Company`, `Facility`, `Healthcare`, `Education`. N.B. this field is case sensitive.
Funding	funding.country	String	The country of the funding body e.g. `United States`, `Germany`, `United Kingdom`
Funding	funding.funding_id	String	The funding organisation’s project identifier e.g.`U01 DE018902`. N.B. this field is case sensitive.
Funding	funding.organisation	String	Name of the funding organisation e.g. `NIDCR NIH HHS`
Funding	funding.organisation.exact	String	For exact matches of full organisational name. N.B. this field is case sensitive.
Conferences	conference.name	String	Conference Name e.g. `International Electron Devices Meeting`
Conferences	conference.instance	String	Conference Instance Name e.g. `CHI 1985`. N.B. this field is case sensitive.
Conferences	conference.location	String	The location of the conference e.g. `Lihue, Kauai, HA, USA`. N.B. this field is case sensitive.
Clinical Trials	clinical_trial.registry	String	Clinical Trial Registry e.g. `10.18810/clinical-trials-gov`. N.B. this field is case sensitive.
Clinical Trials	clinical_trial.trial_id	String	Clinical trial Identifier e.g. `nct00105716`. N.B. this field is case sensitive.
Open Access	open_access.colour	String	The Open Access colour category e.g. `gold`, `green`, `bronze`, `hybrid`, `unknown`. N.B. this field is case sensitive.
Open Access	open_access.license	String	The Open Access license type e.g. `cc-by`. N.B. this field is case sensitive.

Filtering

You can use following pre-defined filters to refine search results:

Field	Description	Possible Value
is_referenced_by_scholarly	Indicates if the scholarly work has been cited by a another scholarly work at least once.	`true`/`false`
has_patent_citations	Indicates if the scholarly work has been cited by a patent document.	`true`/`false`
has_affiliation	Has affiliation	`true`/`false`
has_affiliation_grid	Has affiliation GRID identifier. N.B. GRID identifiers will be deprecated in future and replaced with ROR identifiers	`true`/`false`
has_affiliation_ror	Has affiliation ROR identifier	`true`/`false`
has_orcid	Has an author ORCID identifier	`true`/`false`
has_mesh_term	Has MeSH term	`true`/`false`
has_chemical	Indicates if the scholarly work has an associated chemical substance	`true`/`false`
has_keyword	Indicates if the scholarly work has keyword	`true`/`false`
has_clinical_trial	Indicates if the scholarly work has clinical trial	`true`/`false`
has_field_of_study	Flags if the scholarly work has a Field of Study	`true`/`false`
has_abstract	Indicates if the scholarly work has abstract	`true`/`false`
has_full_text	Indicates if the scholarly work has fulltext	`true`/`false`
has_funding	Indicates if the scholarly work has funding information	`true`/`false`
is_open_access	Flags if the scholarly work has is Open Access	`true`/`false`
in_analytics_set	Indicates if the scholarly work is part of the analytic dataset.	`true`/`false`
source.is_diamond	Non-APC Journal flag - Indicates if the journal does not have article processing charges (APCs), i.e. Diamond Open Access journals.	`true`/`false`
is_retracted	Indicates if the scholarly work has been retracted. .	`true`/`false`

Example:

{
  "query": {
     "match":{
     	  "has_patent_citations": true
     }
  }
}

Pagination

Lens API provides two type of pagination based on their use:

Offset/Size Based Pagination

Use parameter from to define the offset and size to specify number of records expected. This is useful when you want to skip some records and select desired ones. Example below skips first 100 and select 50 records after that.

{
  "query": "Malaria",
  "from": 100,
  "size":50
}

Similarly for GET requests, the following parameters are applicable: size=50&from=100

Note:

Offset/size based paginations is suitable for small result sets only and does not work on result sets of more that 10,000 records. For larger volume data downloads, use Cursor Based Pagination.

Cursor Based Pagination

You can specify records per page using size (default 20 and max 1000) and context alive time scroll (default 1 minute). You will receive a scroll_id in response, which should be passed via request body to access next set of results. Since the scroll_id tends to change every time after each successful requests, please use the most recent scroll_id to access next page. This is not suited for real time user requests.

{
  "scroll_id": "MjAxOTEw;DnF1ZXJ...R2NZdw==",
  "scroll": "1m"
}

Note:

The lifespan of scroll_id is limited to 1 minute for the current API version. Using expired scroll_id will result bad request HTTP response.

Parameter size will be used for first scroll query and will remain the same for whole scroll context. Hence, using size in each scroll request will not have any effect.

Cursor based pagination is only applicable to POST requests.

For optimal performance, we recommend limiting the number of items (e.g. lens_ids) in a single terms query to 10,000.

If no further results found, the response will be 204 and scroll context gets invalidated. The subsequent response will be 400, if same scroll_id is used again.

Sorting

Result can be retrieved in ascending or descending order. Use the following format and fields to apply sorting to the API response. Results can also be sorted by relevance score using relevance.

{
  "sort": [
      {"patent_citation_count":"desc"},
      {"year_published": "asc"},
      {"relevance": "desc"}
  ]
}

For GET requests following structure is applicable: sort=desc(patent_citation_count),asc(date_published),desc(relevance)

Projection

You can control the output fields in the API Response using projection. There are two possible ways to do that.

include: Only request specific fields from the API endpoint
exclude: Fields to be excluded from result

 {"include":["title","patent_citations","authors.affiliations.name"]}

 {"exclude":["external_ids","references"]}

For GET requests following structure is applicable. include=authors,lens_id

Note: Both include and exclude can be used in same request.

Stemming

Stemming allows to reduce the words to root form. E.g. Constructed and constructing will be stemmed to root construct. Since sometime the default stemming might not give you exact result, disabling it will just search for provided form of the word. e.g. "stemming": false

Regex

Regex allows the use of regular expressions in Query String based query, e.g. "regex": true

{
    "query": "field_of_study:/.*[Ee]conom.*/",
    "regex": true
}

Minimum Score

The minimum score represents the relevance score based on the query matching score used in Elasticsearch. This can be used to limit the response to the most relevant results and can be used in 2-steps:

Perform an initial API request to get the max_score. N.B. the size of the request needs to be greater than 0 to return the max_score.
You can then filter by the min_score in subsequent requests.

For example, if the max_score is 14.9 and there are 236K results in total from the initial request, you can pass the min_score as 14 (i.e. less than max_score) in the subsequent request to limit the response to the most relevant results only.

Note:

The max_score will be returned as 0 if size is 0 or if a sort is applied.

Passing the min_score as x% of max_score may not result in top x% results.

The score is calculated for each query by Elasticsearch, and so the max_score value will be different for each query.

The max_score will be returned as 0 if sorting by any fields other than relevance, i.e. {"relevance": "desc"}.

Supported Query Types

Following queries are supported by current version of Lens API:

Note: The Lens API query requests use a modified form of the Elasticsearch Query DSL. For more details on the Elasticsearch query syntax, we recommend reading this guide on the query syntax: Elasticsearch Query DSL

Term Query

Term Query operates in a single term and search for exact term in the field provided.

Example: Find record by publication type

{
    "query": {
        "term": {
            "publication_type": "journal article"
        }
    }
}

Terms Query

Terms Query allows you to search multiple exact terms for a provided field. A useful scenario is while searching multiple identifiers.

Example: Search scholarly works for multiple pmid
{
	"query": {
		"terms": {
			"pmid": ["14297189", "17475107"]
		}
	}
}
Note: Avoid using the Term and Terms queries for text fields. To search text field values, we recommend using the Match and Match Phrase queries instead.

Match query

Match query accepts text/numbers/dates. The main use case of the match query is full-text search. It matches each words separately. If you need to search whole phrase use match phrase query.

Example:

{
  "query": {
      	"match":{
      		"author.affiliation.name": "Harvard University"
      	}
   }
}

Match Phrase query

Match phrase query accepts text/numbers/dates. The main use case for the match query is for full-text search.

Example:

{
  "query": {
      	"match_phrase":{
      		"author.affiliation.name": "Harvard University"
      	}
   }
}

Note: Both Match and Match Phrase are used for text searching but the difference is how they do it. For example, searching for "Cleveland, OH" differs between Match and Match Phrase like this:

Match: standard search in which each word is matched separately (for example: Cleveland OR OH)

Match Phrase: matches the exact phrase provided. In this case it will match the exact text Cleveland, OH

Range query

Range query query to match records within the provided range.

Example: Get record for year published between years 1980 and 2000

{
  "query": {
      	"range": {
            "year_published": {
                "gte": "1980",
                "lte": "2000"
            }
        }
   }
}

Boolean query

Bool Query allows to combine multiple queries to create complex query providing precise result. It can be created using one or more of these clauses: must, should, must_not and filter. You can use must for AND operation and should for OR.

Example: Get journal article scholarly works of Author with last name Kondratyev having patent citations.

{
 "query": {
   "bool": {
     "must": [{
       "match": {
         "has_patent_citations": true
       }},
       {"bool": {
         "must": [
           {"match": {"publication_type": "journal article"}},
           {"match": {"author.last_name": "Kondratyev"}}
         ]
       }
       }
     ]
   }
 }
}

Query String Based Query

Query different terms with explicit operators AND/OR/NOT to create a compact query string.

Example: Find works from institution published between two dates having some title.
{"query": "(title:Dimensions AND author.affiliation.name:(Harvard University)) AND year_published:[2000 TO 2018]"}

If you need to use any reserved special characters, you should escape them with leading backslash.

Example: Getting by doi identifier using string based query
{"query": "doi:10.1109\\/ee.1934.6540358"}

You can use json based format for string based query and mixed with complex boolean queries like this:

{
    "query": {
        "bool": {
            "must": [
                {
                    "query_string": {
                        "query": "\"X-ray analysis of protein crystals\"",
                        "fields": [
                            "title",
                            "abstract",
                            "full_text"
                        ],
                        "default_operator": "and"
                    }
                }
            ],
            "filter": [
                {
                    "term": {
                        "has_affiliation": true
                    }
                }
            ]
        }
    }
}

Note: You can specify the field ui_default in a query_string query to replicate the same search behaviour as a query string on the lens.org user interface.

{
    "query": {
        "query_string": {
            "query": "malaria vaccine",
            "default_operator": "and",
            "fields": [
                "ui_default"
            ]
        }
    }
}