Patent Request
Request Structure
The request fields are used in queries and sort operations. The request payload should comply with following json
schema.
Fields | Description | Required |
---|---|---|
query | Valid json search request | true |
sort | Use available fields to sort results by ascending/descending order. | false |
include | Only get specific fields from API response. By default all fields are selected. | false |
exclude | Get all fields except undesired ones in search result. | false |
size | Integer value to specify number of items per page | false |
from | Integer value, defines the offset from the first result | false |
scroll_id | Pagination parameter | false (true for next scroll requests) |
scroll | Lifespan of Scroll scroll context in minute (e.g. 1m) | false (true for scroll context) |
stemming | Change the ability to reduce the search word into root form | false (true by default) |
language | For multi-lingual fulltext search | false (EN by default) |
regex | For Query String based queries containing regular expressions | false (false by default) |
group_by | For group by patent family queries. Supports group by SIMPLE_FAMILY and EXTENDED_FAMILY |
false |
expand_by | For expand by patent family queries. Supports expand by SIMPLE_FAMILY and EXTENDED_FAMILY |
false |
min_score | For limiting the response to the most relevant results, e.g. "min_score": 14 |
false |
Searchable Fields
For searching, the following fields are supported by the API:
Filtering
You can use the following pre-defined filters to refine your search results:
Example:
{
"query": {
"match":{
"has_full_text": true
}
}
}
Pagination
Lens API provides two type of pagination based on their use:
Offset/Size Based Pagination
Use parameter from
to define the offset and size
to specify number of records expected. This is useful when you want to skip some records and select desired ones. Example below skips first 100 and select 50 records after that.
{
"query": "nanotechnology",
"from": 100,
"size":50
}
Similarly for GET
requests, the following parameters are applicable: size=50&from=100
Note:
- Offset/size based paginations is suitable for small result sets only and does not work on result sets of more that 10,000 records. For larger volume data downloads, use Cursor Based Pagination.
Cursor Based Pagination
You can specify records per page using size
(default 20 and max 100-500, refer to your API plan for your max records per request) and context alive time scroll
(default 1 minute). You will receive a scroll_id
in response, which should be passed via request body to access next set of results. Since the scroll_id
tends to change every time after each successful requests, please use the most recent scroll_id
to access next page. This is not suited for real time user requests.
{
"scroll_id": "MjAxOTEw;DnF1ZXJ...R2NZdw==",
"scroll": "1m"
}
Note:
- The lifespan of scroll_id is limited to 1 minute for the current API version. Using expired scroll_id will result bad request HTTP response.
- Parameter
size
will be used for first scroll query and will remain the same for whole scroll context. Hence, using size in each scroll request will not have any effect.- Cursor based pagination is only applicable to
POST
requests.- For optimal performance, we recommend limiting the number of items (e.g.
lens_ids
) in a single terms query to 10,000.- If no further results found, the response will be
204
and scroll context gets invalidated. The subsequent response will be400
, if samescroll_id
is used again.
Sorting
Result can be retrieved in ascending or descending order. Use the following format and fields to apply sorting to the API response. Results can also be sorted by relevance score using relevance
.
{
"sort": [
{"reference_cited.patent_count":"desc"},
{"year_published": "asc"},
{"relevance": "desc"}
]
}
For GET
requests, the following structure is applicable: sort=desc(reference_cited.patent_count),asc(date_published),desc(relevance)
Projection
You can control the output fields in the API Response using projection. There are two possible ways to do that.
- include: Only request specific fields from the API endpoint
- exclude: Fields to be excluded from result
{"include":["lens_id", "title","description","claim"]}
{"exclude":["legal_status","biblio.classifications_cpc"]}
For GET
requests following structure is applicable.
include=lens_id,title,description,claim
Note: Both include and exclude can be used in same request.
Stemming
Stemming allows to reduce the words to root form. E.g. Constructed and constructing will be stemmed to root construct.
Since sometime the default stemming might not give you exact result, disabling it will just search for provided form of the word.
e.g. "stemming": false
Language
Available search language codes include:
AR
= ArabicDE
= DutchEN
= EnglishES
= SpanishFR
= FrenchJA
= JapaneseKO
= KoreanPT
= PortugeseRU
= RussianZH
= Chinese
Regex
Regex allows the use of regular expressions in Query String based query, e.g. "regex": true
{
"query": "field_of_study:/.*[Ee]conom.*/",
"regex": true
}
Example: Matching kind codes using regular expression:
{
"query": "kind:/A[2-4]/",
"regex": true
}
Group by Family
Group by patent family queries supports group by SIMPLE_FAMILY
and EXTENDED_FAMILY
, e.g. "group_by": "SIMPLE_FAMILY"
. This returns the top sorted patent document record for each family (sorted by relevance by default).
Expand by Family
Expand by patent family queries supports group by SIMPLE_FAMILY
and EXTENDED_FAMILY
, e.g. "expand_by": "SIMPLE_FAMILY"
. This returns all the patent family members from the patent documents that match your query.
Note:
- Group by family does not work with
scroll
requests.
Minimum Score
The minimum score represents the relevance
score based on the query matching score used in Elasticsearch. This can be used to limit the response to the most relevant results and can be used in 2-steps:
- Perform an initial API request to get the
max_score
. N.B. the size of the request needs to be greater than 0 to return themax_score
. - You can then filter by the
min_score
in subsequent requests.
For example, if the max_score
is 14.9 and there are 236K results in total from the initial request, you can pass the min_score
as 14 (i.e. less than max_score) in the subsequent request to limit the response to the most relevant results only.
Note:
- The
max_score
will be returned as0
if size is 0 or if a sort is applied.- Passing the
min_score
as x% ofmax_score
may not result in top x% results.- The score is calculated for each query by Elasticsearch, and so the
max_score
value will be different for each query.- The
max_score
will be returned as 0 if sorting by any fields other thanrelevance
, i.e.{"relevance": "desc"}
.
Supported Query Types
Following queries are supported by current version of Lens API:
Note: The Lens API query requests use a modified form of the Elasticsearch Query DSL. For more details on the Elasticsearch query syntax, we recommend reading this guide on the query syntax: Elasticsearch Query DSL
Term Query
Term Query operates in a single term and search for exact term in the field provided.
Example: Find record by publication type
{ "query": { "term": { "publication_type": "GRANTED_PATENT" } } }
Terms Query
Terms Query allows you to search multiple exact terms for a provided field. A useful scenario is while searching multiple identifiers.
Example: Search for multiple document numbers
{ "query": { "terms": { "doc_number": ["20130227762", "1117265"] } } }
Note:
- Avoid using the Term and Terms queries for text fields. To search text field values, we recommend using the Match and Match Phrase queries instead.
Match query
Match query accepts text/numbers/dates. The main use case of the match query is full-text search. It matches each words separately. If you need to search whole phrase use match phrase query.
Example: Get patents filed by IBM
{ "query": { "match":{ "applicant.name": "IBM" } } }
Match Phrase query
Match phrase query accepts text/numbers/dates. The main use case for the match query is for full-text search.
Example: Get patents filed by IBM
{ "query": { "match_phrase":{ "applicant.name": "IBM" } } }
Note: Both Match and Match Phrase are used for text searching but the difference is how they do it. For example, searching for
"Cleveland, OH"
differs between Match and Match Phrase like this:
- Match: standard search in which each word is matched separately (for example:
Cleveland
OROH
)- Match Phrase: matches the exact phrase provided. In this case it will match the exact text
Cleveland, OH
Range query
Range query query to match records within the provided range.
Example: Get patents published between years 1980 and 2000
{ "query": { "range": { "year_published": { "gte": "1980", "lte": "2000" } } } }
Example: Filter documents which has Patent Term Extension
{ "query": { "range": { "legal_status.term_extension_days": { "gt": 0 } } } }
Boolean query
Bool Query allows to combine multiple queries to create complex query providing precise result. It can be created using one or more of these clauses: must
, should
, must_not
and filter
. You can use must
for AND
operation and should
for OR
.
Example: Search for granted patents from inventors named “Engebretson” that have been cited by other patents.
{ "query": { "bool": { "must": [ { "match": { "cited_by_patent": "true" } }, { "match": { "publication_type": "GRANTED_PATENT" } }, { "match": { "inventor.name": "Engebretson" } } ] } } }
Query String Based Query
Query different terms with explicit operators AND
/OR
/NOT
to create a compact query string.
Example: Find patents with javascript in the title that have been filed by IBM and published between 2000 and 2018.
{"query": "(title:javascript AND applicant.name:(IBM)) AND year_published:[2000 TO 2018]"}
If you need to use any reserved special characters, you should escape them with leading backslash.
Example: Searching by CPC code using string based query
{"query": "class_cpc.symbol:Y02E10\\/70"}
Lucene reserved words like
OR
,AND
can be escaped if you are searching it in the value. Example:{ "query": "applicant.name:\\OR" }
You can use json based format for string based query and mixed with complex boolean queries like this:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "crispr-cas9",
"fields": [
"title",
"claims",
"description"
],
"default_operator": "or"
}
}
],
"filter": [
{
"term": {
"has_owner": true
}
}
]
}
}
}