Pipeline search

The pipeline search lets you chain multiple searches across different datasets in a single request. Each step can extract values from its results and inject them into the next step, enabling powerful cross-dataset correlation without writing custom scripts.

For example, you can find all domains registered by an organization in WHOIS, then look up their DNS records, and finally check which hosts are online - all in one API call.

warning

Pipeline search requires a Basic account at minimum. Basic users are limited to 3 steps, 15 results per step and 20 chained values. Premium subscribers get up to 5 steps, 50 results per step and 100 chained values.

tip

Before using the example commands below, define your API key in a variable. Replace xxx by the correct value. See how to retrieve your API key here.

API_KEY=xxx

How it works

A pipeline is a sequence of steps. Each step queries one of the four datasets (hosts, dns, whois, vhosts) and optionally extracts a field from the results. The extracted values are then injected into the next step's query through the {$input} placeholder.

Step 1: WHOIS search → extract "domain"
         ↓ [acme.com, acme.net, acme.org]
Step 2: DNS search (host:{$input}) → extract "value"
         ↓ [1.2.3.4, 5.6.7.8]
Step 3: Hosts search (resolution:{$input})
         ↓ final results

When {$input} is replaced, the pipeline automatically builds an OR clause matching each extracted value. For host fields, it also matches all subdomains (e.g. host:"acme.com" OR host:"*.acme.com").

Request format

curl "https://api.profundis.io/api/v2/data/auth/pipeline/search" \
  -H "X-API-KEY: $API_KEY" \
  -H "content-type: application/json" \
  -X POST \
  -d '{
    "steps": [...],
    "limit": 10,
    "page": 1
  }'

Top-level parameters

Parameter	Type	Default	Description
`steps`	array	required	Ordered list of pipeline steps (1 to 5)
`limit`	integer	50	Maximum results returned per step
`page`	integer	1	Page number for pagination
`stream_mode`	boolean	false	Enable cursor-based pagination for large result sets (Premium+, single-step only)
`cursor`	string	-	Opaque pagination token from a previous response (used with `stream_mode`)

Step parameters

Each object in the steps array accepts the following parameters.

Parameter	Type	Required	Description
`source`	string	yes	Dataset to query: `hosts`, `dns`, `whois` or `vhosts`
`query`	string	yes	Search query. Use `{$input}` as a placeholder for values extracted from the previous step
`extract_key`	string	yes*	Field to extract from results for the next step. Required on all steps except the last one
`aggregate`	object	no	Aggregation to perform on this step (see Aggregations)
`exclude`	array	no	List of values to exclude from results and extracted keys
`exclude_pattern`	string	no	Wildcard pattern to exclude (e.g. `*.cloudflare.com`)
`dedupe`	boolean	no	Deduplicate extracted keys (case-insensitive)
`max_extract`	integer	no	Limit the number of values passed to the next step (default: config max)

Extractable fields per source

These are the fields you can use as extract_key to chain values between steps.

Source	Available extract keys
`hosts`	`host`, `resolution`, `as_number`, `favicon_hash`, `analytics_tags`, `cert_subj_org`
`dns`	`host`, `value`, `type`
`whois`	`domain`, `registrant_email`, `registrant_organization`, `nameservers`
`vhosts`	`san`, `resolution`, `cert_org`, `cert_common_name`

info

When extracting nameservers, public nameservers (Cloudflare, Akamai, AWS, etc.) are automatically filtered out since they don't indicate a meaningful relationship between domains.

Examples

Find CNAME records that don't point to a specific target

Search all domains owned by an organization, then find DNS CNAME records that don't point to the expected value - useful for detecting potential subdomain takeovers or misconfigurations.

curl "https://api.profundis.io/api/v2/data/auth/pipeline/search" \
  -H "X-API-KEY: $API_KEY" \
  -H "content-type: application/json" \
  -X POST \
  -d '{
    "steps": [
      {
        "source": "whois",
        "query": "registrant_organization:\"Acme Corp\"",
        "extract_key": "domain"
      },
      {
        "source": "dns",
        "query": "host:{$input} AND type:CNAME AND NOT value:\"expected-target.acme.com\"",
        "extract_key": "value"
      }
    ],
    "limit": 10,
    "page": 1
  }'

You can also exclude an entire pattern from the CNAME targets using NOT value:*.amazonaws.com in the query or the exclude_pattern parameter:

{
  "source": "dns",
  "query": "host:{$input} AND type:CNAME",
  "extract_key": "value",
  "exclude_pattern": "*.amazonaws.com"
}

Correlate shared infrastructure via nameservers

Find domains that share the same private nameservers as a target domain - a technique for discovering related assets.

curl "https://api.profundis.io/api/v2/data/auth/pipeline/search" \
  -H "X-API-KEY: $API_KEY" \
  -H "content-type: application/json" \
  -X POST \
  -d '{
    "steps": [
      {
        "source": "whois",
        "query": "domain:target.com",
        "extract_key": "nameservers"
      },
      {
        "source": "whois",
        "query": "nameservers:{$input}",
        "extract_key": "domain",
        "exclude": ["target.com"],
        "dedupe": true
      }
    ],
    "limit": 10,
    "page": 1
  }'

DNS to hosts correlation

Find all IPs behind a domain's DNS records, then check what other hostnames resolve to those IPs.

curl "https://api.profundis.io/api/v2/data/auth/pipeline/search" \
  -H "X-API-KEY: $API_KEY" \
  -H "content-type: application/json" \
  -X POST \
  -d '{
    "steps": [
      {
        "source": "dns",
        "query": "host:*.target.com AND type:A",
        "extract_key": "value"
      },
      {
        "source": "hosts",
        "query": "resolution:{$input}",
        "extract_key": "host"
      }
    ],
    "limit": 10,
    "page": 1
  }'

Certificate transparency to WHOIS

Discover domains from TLS certificates and look up their WHOIS data.

curl "https://api.profundis.io/api/v2/data/auth/pipeline/search" \
  -H "X-API-KEY: $API_KEY" \
  -H "content-type: application/json" \
  -X POST \
  -d '{
    "steps": [
      {
        "source": "vhosts",
        "query": "cert_org:\"Acme Inc\"",
        "extract_key": "san",
        "dedupe": true
      },
      {
        "source": "whois",
        "query": "domain:{$input}"
      }
    ],
    "limit": 10,
    "page": 1
  }'

Aggregations

Instead of returning individual results, a step can aggregate values using database aggregation. This is useful for getting a summary of the most common values in a field.

Aggregation parameters

Parameter	Type	Required	Description
`field`	string	yes	Field to aggregate on (see table below)
`type`	string	yes	Aggregation type: `terms`, `cardinality` or `unique_values`
`size`	integer	no	Maximum number of buckets to return (default: 20, max: 50)
`min_count`	integer	no	Minimum document count for a bucket to be included
`use_as_next`	boolean	no	Use the aggregation bucket keys as input for the next step

Aggregatable fields per source

Source	Available fields
`hosts`	`host`, `resolution`, `as_number`, `analytics_tags`, `favicon_hash`, `tld`, `cert_subj_org`
`dns`	`host`, `value`, `type`, `tld`
`whois`	`domain`, `registrant_email`, `registrant_organization`, `nameservers`, `tld`
`vhosts`	`san`, `resolution`, `cert_org`, `cert_common_name`

Example with aggregation

Find the top registrant emails across domains that share a specific nameserver, then look up all domains using those emails.

curl "https://api.profundis.io/api/v2/data/auth/pipeline/search" \
  -H "X-API-KEY: $API_KEY" \
  -H "content-type: application/json" \
  -X POST \
  -d '{
    "steps": [
      {
        "source": "whois",
        "query": "nameservers:\"ns1.custom-dns.com\"",
        "aggregate": {
          "field": "registrant_email",
          "type": "terms",
          "size": 10,
          "min_count": 2,
          "use_as_next": true
        }
      },
      {
        "source": "whois",
        "query": "registrant_email:{$input}",
        "extract_key": "domain"
      }
    ],
    "limit": 10,
    "page": 1
  }'

Response format

The response contains the results of each step in order, along with metadata about execution time and pagination.

{
  "total_steps": 2,
  "took_ms": 342,
  "steps": [
    {
      "source": "whois",
      "query": "registrant_organization:\"Acme Corp\"",
      "count": 12,
      "total": 12,
      "total_pages": 1,
      "page": 1,
      "extracted_keys": ["acme.com", "acme.net", "acme.org"],
      "data": [
        {
          "domain": "acme.com",
          "registrant_organization": "Acme Corp",
          "registrant_email": "admin@acme.com",
          "nameservers": ["ns1.acme.com", "ns2.acme.com"],
          "expiration_date": "2027-01-15T00:00:00Z"
        }
      ]
    },
    {
      "source": "dns",
      "query": "(host:\"acme.com\" OR host:\"*.acme.com\" ...",
      "count": 5,
      "total": 5,
      "total_pages": 1,
      "page": 1,
      "data": [
        {
          "host": "old.acme.com",
          "type": "CNAME",
          "value": "orphaned.thirdparty.com"
        }
      ]
    }
  ]
}

When using aggregations, each step also includes buckets and aggregations fields:

{
  "source": "whois",
  "count": 50,
  "total": 200,
  "buckets": [
    { "key": "admin@acme.com", "doc_count": 45 },
    { "key": "tech@acme.com", "doc_count": 12 }
  ],
  "aggregations": { ... }
}

When using stream_mode, each step includes pagination cursors:

{
  "source": "dns",
  "count": 50,
  "total": 500,
  "has_more": true,
  "next_cursor": "eyJwaXRfaWQiOi..."
}

Pass the next_cursor value as the cursor parameter in the next request to load the following page.

Limits

	Basic	Premium
Max steps per pipeline	3	5
Max results per step	15	50
Max chained values between steps	20	100
Max aggregation buckets	10	50
Max exclude patterns per step	10	10
Stream mode (cursor pagination)	-	yes

Each step in a pipeline counts as one query against your quota, and the results returned are deducted from your results quota.

Frequent questions

What happens when a step returns no results?

The pipeline continues executing the remaining steps, but they will receive no input values. Their results will be empty. The pipeline does not fail - you still get the response with all step results, allowing you to see where the chain broke.

Can I use the same source in multiple steps?

Yes. A common pattern is WHOIS → WHOIS: find a domain's nameservers, then find other domains using those same nameservers.

How does the `{$input}` placeholder work exactly?

The placeholder is replaced by an OR clause matching all extracted values from the previous step. For example, if step 1 extracted ["a.com", "b.com"] and step 2 has host:{$input}, the actual query becomes:

((host:"a.com" OR host:"*.a.com") OR (host:"b.com" OR host:"*.b.com"))

For non-host fields like resolution:{$input}, values are simply joined:

(resolution:"1.2.3.4" OR resolution:"5.6.7.8")

What is the difference between `exclude` and `NOT` in the query?

Both work, but they operate at different levels:

NOT value:"x" in the query filters results at the database level - excluded documents are never returned.
exclude: ["x"] filters extracted keys after the search - the documents still appear in the step's data, but the excluded values won't be passed to the next step.

Use NOT in the query when you don't want to see the results at all. Use exclude when you want to see the results but prevent specific values from propagating to subsequent steps.

Can I combine aggregation with regular results?

Yes. When you add an aggregate object to a step, you get both the regular data array (paginated documents) and the buckets array (aggregation results). If you set use_as_next: true, the aggregation bucket keys are used as input for the next step instead of values extracted from individual documents.

How it works​

Request format​

Top-level parameters​

Step parameters​

Extractable fields per source​

Examples​

Find CNAME records that don't point to a specific target​

Correlate shared infrastructure via nameservers​

DNS to hosts correlation​

Certificate transparency to WHOIS​

Aggregations​

Aggregation parameters​

Aggregatable fields per source​

Example with aggregation​

Response format​

Limits​

Frequent questions​

What happens when a step returns no results?​

Can I use the same source in multiple steps?​

How does the {$input} placeholder work exactly?​

What is the difference between exclude and NOT in the query?​

Can I combine aggregation with regular results?​