Skip to main content

Pipeline search

The pipeline search lets you chain multiple searches across different datasets in a single request. Each step can extract values from its results and inject them into the next step, enabling powerful cross-dataset correlation without writing custom scripts.

For example, you can find all domains registered by an organization in WHOIS, then look up their DNS records, and finally check which hosts are online - all in one API call.

warning

Pipeline search requires a Basic account at minimum. Basic users are limited to 3 steps, 15 results per step and 20 chained values. Premium subscribers get up to 5 steps, 50 results per step and 100 chained values.

tip

Before using the example commands below, define your API key in a variable. Replace xxx by the correct value. See how to retrieve your API key here.

API_KEY=xxx

How it works

A pipeline is a sequence of steps. Each step queries one of the four datasets (hosts, dns, whois, vhosts) and optionally extracts a field from the results. The extracted values are then injected into the next step's query through the {$input} placeholder.

Step 1: WHOIS search → extract "domain"
↓ [acme.com, acme.net, acme.org]
Step 2: DNS search (host:{$input}) → extract "value"
↓ [1.2.3.4, 5.6.7.8]
Step 3: Hosts search (resolution:{$input})
↓ final results

When {$input} is replaced, the pipeline automatically builds an OR clause matching each extracted value. For host fields, it also matches all subdomains (e.g. host:"acme.com" OR host:"*.acme.com").

Request format

curl "https://api.profundis.io/api/v2/data/auth/pipeline/search" \
-H "X-API-KEY: $API_KEY" \
-H "content-type: application/json" \
-X POST \
-d '{
"steps": [...],
"limit": 10,
"page": 1
}'

Top-level parameters

ParameterTypeDefaultDescription
stepsarrayrequiredOrdered list of pipeline steps (1 to 5)
limitinteger50Maximum results returned per step
pageinteger1Page number for pagination
stream_modebooleanfalseEnable cursor-based pagination for large result sets (Premium+, single-step only)
cursorstring-Opaque pagination token from a previous response (used with stream_mode)

Step parameters

Each object in the steps array accepts the following parameters.

ParameterTypeRequiredDescription
sourcestringyesDataset to query: hosts, dns, whois or vhosts
querystringyesSearch query. Use {$input} as a placeholder for values extracted from the previous step
extract_keystringyes*Field to extract from results for the next step. Required on all steps except the last one
aggregateobjectnoAggregation to perform on this step (see Aggregations)
excludearraynoList of values to exclude from results and extracted keys
exclude_patternstringnoWildcard pattern to exclude (e.g. *.cloudflare.com)
dedupebooleannoDeduplicate extracted keys (case-insensitive)
max_extractintegernoLimit the number of values passed to the next step (default: config max)

Extractable fields per source

These are the fields you can use as extract_key to chain values between steps.

SourceAvailable extract keys
hostshost, resolution, as_number, favicon_hash, analytics_tags, cert_subj_org
dnshost, value, type
whoisdomain, registrant_email, registrant_organization, nameservers
vhostssan, resolution, cert_org, cert_common_name
info

When extracting nameservers, public nameservers (Cloudflare, Akamai, AWS, etc.) are automatically filtered out since they don't indicate a meaningful relationship between domains.

Examples

Find CNAME records that don't point to a specific target

Search all domains owned by an organization, then find DNS CNAME records that don't point to the expected value - useful for detecting potential subdomain takeovers or misconfigurations.

curl "https://api.profundis.io/api/v2/data/auth/pipeline/search" \
-H "X-API-KEY: $API_KEY" \
-H "content-type: application/json" \
-X POST \
-d '{
"steps": [
{
"source": "whois",
"query": "registrant_organization:\"Acme Corp\"",
"extract_key": "domain"
},
{
"source": "dns",
"query": "host:{$input} AND type:CNAME AND NOT value:\"expected-target.acme.com\"",
"extract_key": "value"
}
],
"limit": 10,
"page": 1
}'

You can also exclude an entire pattern from the CNAME targets using NOT value:*.amazonaws.com in the query or the exclude_pattern parameter:

{
"source": "dns",
"query": "host:{$input} AND type:CNAME",
"extract_key": "value",
"exclude_pattern": "*.amazonaws.com"
}

Correlate shared infrastructure via nameservers

Find domains that share the same private nameservers as a target domain - a technique for discovering related assets.

curl "https://api.profundis.io/api/v2/data/auth/pipeline/search" \
-H "X-API-KEY: $API_KEY" \
-H "content-type: application/json" \
-X POST \
-d '{
"steps": [
{
"source": "whois",
"query": "domain:target.com",
"extract_key": "nameservers"
},
{
"source": "whois",
"query": "nameservers:{$input}",
"extract_key": "domain",
"exclude": ["target.com"],
"dedupe": true
}
],
"limit": 10,
"page": 1
}'

DNS to hosts correlation

Find all IPs behind a domain's DNS records, then check what other hostnames resolve to those IPs.

curl "https://api.profundis.io/api/v2/data/auth/pipeline/search" \
-H "X-API-KEY: $API_KEY" \
-H "content-type: application/json" \
-X POST \
-d '{
"steps": [
{
"source": "dns",
"query": "host:*.target.com AND type:A",
"extract_key": "value"
},
{
"source": "hosts",
"query": "resolution:{$input}",
"extract_key": "host"
}
],
"limit": 10,
"page": 1
}'

Certificate transparency to WHOIS

Discover domains from TLS certificates and look up their WHOIS data.

curl "https://api.profundis.io/api/v2/data/auth/pipeline/search" \
-H "X-API-KEY: $API_KEY" \
-H "content-type: application/json" \
-X POST \
-d '{
"steps": [
{
"source": "vhosts",
"query": "cert_org:\"Acme Inc\"",
"extract_key": "san",
"dedupe": true
},
{
"source": "whois",
"query": "domain:{$input}"
}
],
"limit": 10,
"page": 1
}'

Aggregations

Instead of returning individual results, a step can aggregate values using database aggregation. This is useful for getting a summary of the most common values in a field.

Aggregation parameters

ParameterTypeRequiredDescription
fieldstringyesField to aggregate on (see table below)
typestringyesAggregation type: terms, cardinality or unique_values
sizeintegernoMaximum number of buckets to return (default: 20, max: 50)
min_countintegernoMinimum document count for a bucket to be included
use_as_nextbooleannoUse the aggregation bucket keys as input for the next step

Aggregatable fields per source

SourceAvailable fields
hostshost, resolution, as_number, analytics_tags, favicon_hash, tld, cert_subj_org
dnshost, value, type, tld
whoisdomain, registrant_email, registrant_organization, nameservers, tld
vhostssan, resolution, cert_org, cert_common_name

Example with aggregation

Find the top registrant emails across domains that share a specific nameserver, then look up all domains using those emails.

curl "https://api.profundis.io/api/v2/data/auth/pipeline/search" \
-H "X-API-KEY: $API_KEY" \
-H "content-type: application/json" \
-X POST \
-d '{
"steps": [
{
"source": "whois",
"query": "nameservers:\"ns1.custom-dns.com\"",
"aggregate": {
"field": "registrant_email",
"type": "terms",
"size": 10,
"min_count": 2,
"use_as_next": true
}
},
{
"source": "whois",
"query": "registrant_email:{$input}",
"extract_key": "domain"
}
],
"limit": 10,
"page": 1
}'

Response format

The response contains the results of each step in order, along with metadata about execution time and pagination.

{
"total_steps": 2,
"took_ms": 342,
"steps": [
{
"source": "whois",
"query": "registrant_organization:\"Acme Corp\"",
"count": 12,
"total": 12,
"total_pages": 1,
"page": 1,
"extracted_keys": ["acme.com", "acme.net", "acme.org"],
"data": [
{
"domain": "acme.com",
"registrant_organization": "Acme Corp",
"registrant_email": "[email protected]",
"nameservers": ["ns1.acme.com", "ns2.acme.com"],
"expiration_date": "2027-01-15T00:00:00Z"
}
]
},
{
"source": "dns",
"query": "(host:\"acme.com\" OR host:\"*.acme.com\" ...",
"count": 5,
"total": 5,
"total_pages": 1,
"page": 1,
"data": [
{
"host": "old.acme.com",
"type": "CNAME",
"value": "orphaned.thirdparty.com"
}
]
}
]
}

When using aggregations, each step also includes buckets and aggregations fields:

{
"source": "whois",
"count": 50,
"total": 200,
"buckets": [
{ "key": "[email protected]", "doc_count": 45 },
{ "key": "[email protected]", "doc_count": 12 }
],
"aggregations": { ... }
}

When using stream_mode, each step includes pagination cursors:

{
"source": "dns",
"count": 50,
"total": 500,
"has_more": true,
"next_cursor": "eyJwaXRfaWQiOi..."
}

Pass the next_cursor value as the cursor parameter in the next request to load the following page.

Limits

BasicPremium
Max steps per pipeline35
Max results per step1550
Max chained values between steps20100
Max aggregation buckets1050
Max exclude patterns per step1010
Stream mode (cursor pagination)-yes

Each step in a pipeline counts as one query against your quota, and the results returned are deducted from your results quota.

Frequent questions

What happens when a step returns no results?

The pipeline continues executing the remaining steps, but they will receive no input values. Their results will be empty. The pipeline does not fail - you still get the response with all step results, allowing you to see where the chain broke.

Can I use the same source in multiple steps?

Yes. A common pattern is WHOIS → WHOIS: find a domain's nameservers, then find other domains using those same nameservers.

How does the {$input} placeholder work exactly?

The placeholder is replaced by an OR clause matching all extracted values from the previous step. For example, if step 1 extracted ["a.com", "b.com"] and step 2 has host:{$input}, the actual query becomes:

((host:"a.com" OR host:"*.a.com") OR (host:"b.com" OR host:"*.b.com"))

For non-host fields like resolution:{$input}, values are simply joined:

(resolution:"1.2.3.4" OR resolution:"5.6.7.8")

What is the difference between exclude and NOT in the query?

Both work, but they operate at different levels:

  • NOT value:"x" in the query filters results at the database level - excluded documents are never returned.
  • exclude: ["x"] filters extracted keys after the search - the documents still appear in the step's data, but the excluded values won't be passed to the next step.

Use NOT in the query when you don't want to see the results at all. Use exclude when you want to see the results but prevent specific values from propagating to subsequent steps.

Can I combine aggregation with regular results?

Yes. When you add an aggregate object to a step, you get both the regular data array (paginated documents) and the buckets array (aggregation results). If you set use_as_next: true, the aggregation bucket keys are used as input for the next step instead of values extracted from individual documents.