[IRONSCALES] Initial release of IRONSCALES integration#15982
[IRONSCALES] Initial release of IRONSCALES integration#15982akshraj-crest wants to merge 6 commits intoelastic:mainfrom
Conversation
|
Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as |
All relevant
Based on the API behavior, the
I think convert processors used for string conversion do not require
The cleanup script is executed at the end of the ingest pipeline, ensuring that all ECS-required fields are already populated and not impacted.
CDR fields have not been mapped. |
@akshraj-crest, we are running some tests atm with an internal review-bot, please don't consider this as an actual review, someone else will do a full review later. This is just to see how generic or accurate the bot is behaving atm. |
There was a problem hiding this comment.
Pull request overview
This PR introduces the initial release of the IRONSCALES integration for Elastic, enabling collection and visualization of email security incident data from the IRONSCALES anti-phishing platform.
Changes:
- Added IRONSCALES integration package with incident data stream
- Implemented CEL-based API connector with JWT authentication and pagination
- Created Elasticsearch transform for maintaining latest incident states
- Added dashboard with visualizations for incident analysis
Reviewed changes
Copilot reviewed 31 out of 36 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/ironscales/manifest.yml | Package configuration defining integration metadata, deployment modes (agentless/agent-based), and input parameters |
| packages/ironscales/data_stream/incident/* | Complete incident data stream including CEL configuration, ingest pipeline, field definitions, and ILM policy |
| packages/ironscales/elasticsearch/transform/latest_incident/* | Transform configuration to maintain latest incident states with 30-day retention |
| packages/ironscales/kibana/dashboard/* | Dashboard and saved searches for incident visualization and analysis |
| packages/ironscales/docs/README.md | Comprehensive documentation including setup instructions, API usage, and example events |
| packages/ironscales/changelog.yml | Version 0.1.0 changelog entry |
| .github/CODEOWNERS | Added ownership to security-service-integrations team |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ShourieG
left a comment
There was a problem hiding this comment.
🤖 AI-Generated Review | Elastic Integration PR Review Bot
⚠️ This is an automated review generated by an AI assistant. Please verify all suggestions before applying changes. This review does not represent a human reviewer's opinion.
PR Review | elastic/integrations #15982
Field Mapping
Data Stream: incident (package: ironscales)
File: packages/ironscales/data_stream/incident/fields/fields.yml
Issue 1: Email body/subject fields may need text type for search
Severity: 🟡 Medium
Location: packages/ironscales/data_stream/incident/fields/fields.yml lines 48, 50, 97
Problem: Free-form text fields (email_body_text, email_subject, original_email_body) use type: keyword which limits full-text search capability.
Recommendation:
- name: email_body_text
type: text
fields:
keyword:
type: keyword
ignore_above: 1024Issue 2: Missing metric_type annotations on count fields
Severity: 🟡 Medium
Location: packages/ironscales/data_stream/incident/fields/fields.yml lines 8, 23, 38, 55-63, 88, 103-105
Problem: Count fields like affected_mailboxes_count, attachments_count, comments_count are missing metric_type annotations for proper metrics handling.
Recommendation:
- name: affected_mailboxes_count
type: long
metric_type: gaugePipeline
Data Stream: incident (package: ironscales)
File: packages/ironscales/data_stream/incident/elasticsearch/ingest_pipeline/default.yml
Issue 1: Missing Event Categorization Fields
Severity: 🟡 Medium
Location: packages/ironscales/data_stream/incident/elasticsearch/ingest_pipeline/default.yml lines 48-51
Problem: event.category and event.type are not populated. Only event.kind: event is set. For a security incident integration, these fields are valuable for filtering and categorization.
Recommendation:
- set:
field: event.category
value: [email, threat]
- set:
field: event.type
value: [info]Issue 2: Missing related.hash Field
Severity: 🟡 Medium
Location: packages/ironscales/data_stream/incident/elasticsearch/ingest_pipeline/default.yml lines 547-551
Problem: MD5 hashes from email attachments are not added to related.hash, limiting correlation capabilities for threat hunting.
Recommendation:
- foreach:
field: ironscales.incident.attachments
if: ctx.ironscales?.incident?.attachments != null
processor:
append:
field: related.hash
value: "{{{_ingest._value.md5}}}"
allow_duplicates: falseIssue 3: Incorrect Use of event.created
Severity: 🟡 Medium
Location: packages/ironscales/data_stream/incident/elasticsearch/ingest_pipeline/default.yml lines 478-482
Problem: ironscales.incident.created is mapped to event.created. According to ECS, event.created should be when the event was created in the agent/pipeline. The incident creation time should map to @timestamp instead.
Recommendation:
- set:
field: '@timestamp'
copy_from: ironscales.incident.created
if: ctx.ironscales?.incident?.created != nullIssue 4: Missing error handling on JSON processor
Severity: 🟡 Medium
Location: packages/ironscales/data_stream/incident/elasticsearch/ingest_pipeline/default.yml line 42
Problem: Malformed JSON causes entire document to fail without error handling.
Recommendation:
- json:
field: message
target_field: ironscales.incident
tag: json_decode_ironscales_incident
on_failure:
- append:
field: error.message
value: "Failed to parse JSON: {{{_ingest.on_failure_message}}}"Issue 5: Missing error handling on camelCase script
Severity: 🟡 Medium
Location: packages/ironscales/data_stream/incident/elasticsearch/ingest_pipeline/default.yml line 54
Problem: Script failure loses entire document without error handling.
Recommendation:
- script:
lang: painless
tag: script_convert_camelcase_to_snakecase
source: |
[script content]
on_failure:
- append:
field: error.message
value: "Failed to convert camelCase fields: {{{_ingest.on_failure_message}}}"Issue 6: Inefficient Painless script for camelCase conversion
Severity: 🟠 High
Location: packages/ironscales/data_stream/incident/elasticsearch/ingest_pipeline/default.yml line 54
Problem: Complex recursive script processes entire document on every event with inefficient string concatenation in loops.
Recommendation:
Move field name normalization to CEL input stage before ingestion to avoid expensive recursive processing in the pipeline.
Issue 7: Unconditional recursive null value cleanup script
Severity: 🟡 Medium
Location: packages/ironscales/data_stream/incident/elasticsearch/ingest_pipeline/default.yml line 710
Problem: Recursively walks entire document structure on every event without conditional execution.
Recommendation:
Add conditional execution or use ignore_empty_value on individual processors instead of global cleanup.
Issue 8: Incomplete null check in IP conversion
Severity: 🟡 Medium
Location: packages/ironscales/data_stream/incident/elasticsearch/ingest_pipeline/default.yml line 429
Problem: Null values may cause unexpected behavior in IP conversion.
Recommendation:
- convert:
field: ironscales.incident.mail_server.ip
type: ip
if: ctx.ironscales?.incident?.mail_server?.ip != null && ctx.ironscales.incident.mail_server.ip != ''
ignore_missing: trueIssue 9: Missing error handling on cleanup script
Severity: 🟡 Medium
Location: packages/ironscales/data_stream/incident/elasticsearch/ingest_pipeline/default.yml line 710
Problem: Cleanup failure loses all processed data.
Recommendation:
- script:
lang: painless
tag: script_cleanup_null_values
source: |
[script content]
on_failure:
- append:
field: error.message
value: "Failed to cleanup null values: {{{_ingest.on_failure_message}}}"💡 Suggestions
- Run CEL program through celfmt for standard formatting (trailing commas, consistent parentheses)
- Add explicit 429 rate limiting handling in CEL input configuration
- Delete redundant remove processor after rename operation (line 35)
- Add missing on_failure handlers on string conversions (lines 415, 421)
Input Configuration
Data Stream: incident (package: ironscales)
File: packages/ironscales/data_stream/incident/agent/stream/cel.yml.hbs
Issue 1: CEL program formatting does not match celfmt standard
Severity: 🔵 Low
Location: packages/ironscales/data_stream/incident/agent/stream/cel.yml.hbs line 28
Problem: CEL program formatting lacks consistent spacing, parentheses placement, and trailing commas that improve readability.
Recommendation:
Run the CEL program through celfmt to apply standard formatting with trailing commas in objects, wrapped conditions in parentheses, and consistent indentation.
Issue 2: No explicit rate limiting handling for 429 status codes
Severity: 🔵 Low
Location: packages/ironscales/data_stream/incident/agent/stream/cel.yml.hbs line 88
Problem: While the default 24-hour interval and page_size=100 likely keep requests within limits, explicit handling would improve resilience.
Recommendation:
# After 403 handling, add:
: (resp.StatusCode == 429) ?
{
"events": {"error": {"message": "Rate limit exceeded"}},
"next": {"jwt_token": token.next.jwt_token, ?"page": state.?next.page},
"worklist": state.?worklist.orValue({}),
"want_more": false
}Transform
Package: ironscales
File: packages/ironscales/elasticsearch/transform/latest_incident/transform.yml
Issue 1: Transform frequency too aggressive (30s)
Severity: 🟡 Medium
Location: packages/ironscales/elasticsearch/transform/latest_incident/transform.yml line 22
Problem: 30-second frequency may cause excessive resource usage and unnecessary processing overhead.
Recommendation:
frequency: 1mIssue 2: Non-standard destination index naming convention
Severity: 🔵 Low
Location: packages/ironscales/elasticsearch/transform/latest_incident/transform.yml line 11
Problem: Destination index naming doesn't follow clear conventions.
Recommendation:
Consider using a clearer naming pattern like logs-ironscales.incident_latest-default
File: packages/ironscales/elasticsearch/transform/latest_incident/fields/fields.yml
Issue 3: Missing description attributes on all transform output fields
Severity: 🔵 Low
Location: packages/ironscales/elasticsearch/transform/latest_incident/fields/fields.yml line 7
Problem: All fields are missing description attributes, which are important for understanding transform output.
Recommendation:
- name: affected_mailboxes_count
type: long
description: Count of mailboxes affected by this incident from the transform aggregation.Issue 4: Group fields may need normalize: [array] attribute
Severity: 🟡 Medium
Location: packages/ironscales/elasticsearch/transform/latest_incident/fields/fields.yml lines 11, 78, 114
Problem: Fields like attachments, links, and reports are defined as group types but may actually be arrays in the transform output.
Recommendation:
- name: attachments
type: group
normalize:
- array
fields:
- name: file_name
type: keyword
description: Name of the attached file.File: packages/ironscales/elasticsearch/transform/latest_incident/fields/is-transform-source-false.yml
Issue 5: Transform field definition contains source data stream field
Severity: 🟠 High
Location: packages/ironscales/elasticsearch/transform/latest_incident/fields/is-transform-source-false.yml line 1
Problem: This field definition belongs in the source data stream fields directory, not in transform output fields. Transform field definitions should only contain aggregated output fields.
Recommendation:
Move this field definition to packages/ironscales/data_stream/incident/fields/labels.yml
File: packages/ironscales/elasticsearch/transform/latest_incident/fields/beats.yml
Issue 6: Transform field definitions contain Filebeat input fields
Severity: 🟡 Medium
Location: packages/ironscales/elasticsearch/transform/latest_incident/fields/beats.yml line 1
Problem: The fields defined (input.type and log.offset) are Filebeat-specific input fields, NOT typical transform output fields. These should be in the source data stream, not transform output.
Recommendation:
Verify this is actually a transform and define the actual OUTPUT fields (aggregations, group_by results) instead of source input fields.
Summary
| Severity | Count |
|---|---|
| 🔴 Critical | 0 |
| 🟠 High | 2 |
| 🟡 Medium | 15 |
| 🔵 Low | 4 |
Total Actionable Items: 21
packages/ironscales/elasticsearch/transform/latest_incident/fields/fields.yml
Show resolved
Hide resolved
packages/ironscales/elasticsearch/transform/latest_incident/fields/fields.yml
Show resolved
Hide resolved
packages/ironscales/elasticsearch/transform/latest_incident/fields/beats.yml
Show resolved
Hide resolved
packages/ironscales/data_stream/incident/elasticsearch/ingest_pipeline/default.yml
Show resolved
Hide resolved
packages/ironscales/elasticsearch/transform/latest_incident/transform.yml
Show resolved
Hide resolved
packages/ironscales/elasticsearch/transform/latest_incident/transform.yml
Show resolved
Hide resolved
...ages/ironscales/elasticsearch/transform/latest_incident/fields/is-transform-source-false.yml
Show resolved
Hide resolved
packages/ironscales/data_stream/incident/elasticsearch/ingest_pipeline/default.yml
Show resolved
Hide resolved
|
/test |
🚀 Benchmarks reportTo see the full report comment with |
💚 Build Succeeded
|
|
|
||
| These inputs can be used in this integration: | ||
|
|
||
| - [cel](https://www.elastic.co/docs/reference/beats/filebeat/filebeat-input-cel) |
There was a problem hiding this comment.
| - [cel](https://www.elastic.co/docs/reference/beats/filebeat/filebeat-input-cel) | |
| - [CEL](https://www.elastic.co/docs/reference/beats/filebeat/filebeat-input-cel) |
| @@ -0,0 +1,3 @@ | |||
| dependencies: | |||
| ecs: | |||
| reference: git@v9.2.0 | |||
| body.with({ | ||
| ?"detailed_classification": has(body.classification) ? optional.of(body.classification) : optional.none() |
There was a problem hiding this comment.
| body.with({ | |
| ?"detailed_classification": has(body.classification) ? optional.of(body.classification) : optional.none() | |
| body.with({ | |
| ?"detailed_classification": has(body.classification) ? optional.of(body.classification) : optional.none() |
| - set: | ||
| tag: set_ecs_version_to_9_2_0_3273339c | ||
| field: ecs.version | ||
| value: 9.2.0 |
| work.?worklist.incidents[0].hasValue() ? | ||
| request( | ||
| "GET", | ||
| state.url.trim_right("/") + "/appapi/incident/" + state.company_id + "/details/" + string(int(work.worklist.incidents[0].incidentID)) |
There was a problem hiding this comment.
I'm very worried about scalability of this pulling model.
Each interval, we list all incidents, and then call /details per incident. The number of API calls can grow significantly large for a large firm making the integration unusable.
Please check if there is an alternative way to fetch the incidents details. (another API, webhooks, export to another storage say S3, etc.)
There was a problem hiding this comment.
I agree that this approach has scalability limitations. However, due to the current API behavior, we cannot reliably fetch only updated incidents (e.g., when an incident’s classification changes), so we rely on a full sync each interval to ensure we have latest incidents.
As per our current understanding:
- There does not appear to be an alternative API to fetch incidents with details in bulk.
- The
/detailsendpoint seems to support only one incident per call. - We have not found any webhook or export-based alternative so far.
| - name: email_subject | ||
| type: text |
There was a problem hiding this comment.
This should be a multi field similar to https://www.elastic.co/docs/reference/ecs/ecs-email#field-email-subject
| - external: ecs | ||
| name: observer.vendor | ||
| type: constant_keyword | ||
| value: IRONSCALES |
There was a problem hiding this comment.
Can you keep this in another file. If this file is later deleted, the overridden field will still be kept.
There was a problem hiding this comment.
Also add:
Incidents over timetrend.- Metric/Pie on
sender_is_internalflag - Trend on
affected_mailbox_count - Top
reportedBy,resolvedBy - Is there
status- whether it is active/closed?
There was a problem hiding this comment.
Thanks for the suggestions.
I agree with the 2nd (Metric/Pie on sender_is_internal) and 4th (Top reportedBy, resolvedBy) suggestions.
Regarding the status field, there is no explicit field in response which indicates whether an incident is active or closed.
One consideration regarding the trend-based visualizations is that incident updates do not appear to modify a reliable timestamp field in response, and we are using a full sync + transform approach, the @timestamp available here is event.ingested. As a result, trend visualizations may not accurately reflect the actual timeline of the incidents.
Proposed commit message
The initial release includes incident data stream, associated dashboards
and visualizations.
IRONSCALES fields are mapped to their corresponding ECS fields where possible.
Test samples were derived from documentation and live data samples,
which were subsequently sanitized.
Checklist
changelog.ymlfile.How to test this PR locally
Related issues
Screenshots
Go Code for Ingest Pipeline Generation
The incident data stream pipeline is generated using Go code built on top of the Dispear library.
Below is the code used for generating the pipeline logic: