Skip to content

#850 Add the ability to define rules when redefined fields are valid#851

Merged
yruslan merged 7 commits into
masterfrom
feature/850-add-rule-based-redefines-check
May 28, 2026
Merged

#850 Add the ability to define rules when redefined fields are valid#851
yruslan merged 7 commits into
masterfrom
feature/850-add-rule-based-redefines-check

Conversation

@yruslan
Copy link
Copy Markdown
Collaborator

@yruslan yruslan commented May 28, 2026

Closes #850

Summary by CodeRabbit

  • New Features

    • Automatic filtering of COBOL REDEFINES via boolean rule expressions
    • New redefine_rule / redefine-rule option to map REDEFINE targets to selection expressions
    • Expressions support comparisons, logical operators, string/null/boolean literals, in() and if() functions
  • Documentation

    • README section with examples, option usage, sample output and expression syntax/constraints

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Walkthrough

Adds an expression grammar and evaluator, wires expression evaluators from reader parameters into copybook parsing, annotates AST fields/groups with rule expressions, and gates record extraction so REDEFINES alternatives are selected by evaluated boolean rules.

Changes

REDEFINES Rule Expression Framework

Layer / File(s) Summary
Expression language foundation
cobol-parser/src/main/scala/.../lexer/Lexer.scala, .../lexer/Token.scala, .../parser/ExpressionBuilder.scala, .../parser/Parser.scala
Lexer, token set, and parser interface extended to support string literals, comparison/logical operators, null/true/false literals and NOT handling.
Expression evaluation engine
cobol-parser/src/main/scala/.../expression/ExpressionEvaluator.scala, .../parser/ExpressionBuilderImpl.scala, .../parser/ExtractVariablesBuilder.scala
ExpressionBuilderImpl implements typed evaluation (ints/strings/bools/null), functions in/if, and precedence; ExpressionEvaluator provides typed setters and evalInt/evalBool/evalString and variable extraction.
AST model and transformer
cobol-parser/src/main/scala/.../ast/Statement.scala, Group.scala, Primitive.scala, asttransform/RuleExpressionSetter.scala, exceptions/RuleExpressionParsingException.scala, antlr/ParserVisitor.scala
Statement trait plus Group/Primitive extended with optional ruleExpression and enablement hooks; Primitive adds isUsedInRules flag and copy helpers; RuleExpressionSetter validates mapping and injects evaluators into AST; new exception type for parsing/validation errors.
Configuration and schema wiring
cobol-parser/src/main/scala/.../reader/parameters/CobolParameters.scala, CobolParametersParser.scala, ReaderParameters.scala, CopybookParser.scala, CobolSchema.scala, ReaderParametersValidator.scala
CobolParametersParser parses redefine-rule* options into evaluator strings, builds ExpressionEvaluator instances, ReaderParameters carries a map of evaluators, CopybookParser.parseTree accepts and applies them during AST transformation, and schema building is updated accordingly.
Record extraction gated by rules
cobol-parser/src/main/scala/.../reader/extractors/record/RecordExtractors.scala, .../extractors/raw/FixedWithRecordLengthExprRawRecordExtractor.scala, .../reader/iterator/RecordLengthExpression.scala
RecordExtractors maintains per-extraction variables, introduces canExtract(...) which injects variables into evaluators and returns boolean; when canExtract is false, fields are skipped and return null; gated primitives update variables when isUsedInRules is true; record-length uses evalInt.
Testing and documentation
README.md, cobol-parser/src/test/scala/.../expression/ExpressionEvaluatorSuite.scala, asttransform/RuleExpressionSetterSuite.scala, extract/BinaryExtractorSpec.scala, spark-cobol/src/test/.../ParametersParsingSpec.scala, integration/Test42RedefineRulesSpec.scala
README documents new redefine_rule:<n> usage and expression syntax; unit and integration tests added/updated to cover evaluator, AST transformer validation, parameter parsing, and end-to-end REDEFINES scenarios for text and binary inputs.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • AbsaOSS/cobrix#771: Overlaps in ParserVisitor.visitPrimitive and Primitive constructor edits.

"🐰 I hopped through tokens, operators, and rules,
I set evaluators for redefined schools,
Now fields wake only when my booleans sing,
The right redefine is the rabbit's spring."

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main change: adding the ability to define rules for when redefined fields are valid, which accurately reflects the comprehensive feature implementation across the codebase.
Linked Issues check ✅ Passed The implementation fully satisfies issue #850 requirements by adding expression-based rule support for REDEFINES through new parameters, expression evaluation infrastructure, and integration throughout the parsing pipeline.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing rule-based REDEFINES validation. Minor improvements like the README typo fix and test infrastructure updates are natural, in-scope components of the feature implementation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/850-add-rule-based-redefines-check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 28, 2026

JaCoCo code coverage report - 'cobol-parser'

Overall Project 90.1% -4.17% 🍏
Files changed 58.92% 🍏

File Coverage
RecordLengthExpression.scala 100% -23.53% 🍏
RecordLengthField.scala 100% 🍏
RuleExpressionSetter.scala 97.67% -69.77% 🍏
ExpressionEvaluator.scala 97.5% -2.5% 🍏
Lexer.scala 94.8% -0.78% 🍏
ParserVisitor.scala 89.46% -0.07% 🍏
ExtractVariablesBuilder.scala 85.42% -14.58% 🍏
CopybookParser.scala 84.32% 🍏
RuleExpressionParsingException.scala 83.08% -20% 🍏
CobolSchema.scala 81.29% 🍏
Group.scala 76.83% -9.76%
Primitive.scala 75.99% -0.61% 🍏
ExpressionBuilderImpl.scala 75.07% -27.84% 🍏
Token.scala 74.9%
FixedWithRecordLengthExprRawRecordExtractor.scala 71.12% 🍏
ReaderParametersValidator.scala 65.84% 🍏
RecordExtractors.scala 63.06% -39.08% 🍏
Parser.scala 53.82% -21.78% 🍏
Statement.scala 51.87% 🍏

@github-actions
Copy link
Copy Markdown

JaCoCo code coverage report - 'spark-cobol'

Overall Project 83.4% 🍏

There is no coverage information present for the Files changed

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/Parser.scala (1)

39-80: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Actually throw ExprSyntaxError in the fallback branches.

STATE0, STATE1, and MINUS_NUM currently instantiate ExprSyntaxError but never throw it. That lets malformed input be partially accepted; for example, new ExpressionEvaluator("1 true").evalInt() can return 1 because the trailing TRUE_LITERAL falls through the non-throwing STATE1 default. In this feature that means an invalid redefine rule can survive parsing and then gate extraction with the wrong predicate.

🔧 Minimal fix
-          case _ => new ExprSyntaxError(s"Unexpected '$token' at pos ${token.pos}")
+          case _ => throw new ExprSyntaxError(s"Unexpected '$token' at pos ${token.pos}")
...
-          case _ => new ExprSyntaxError(s"Unexpected '$token' at pos ${token.pos}")
+          case _ => throw new ExprSyntaxError(s"Unexpected '$token' at pos ${token.pos}")
...
-          case _ => new ExprSyntaxError(s"Unexpected '$token' at pos ${token.pos}")
+          case _ => throw new ExprSyntaxError(s"Unexpected '$token' at pos ${token.pos}")

Also applies to: 82-154

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/Parser.scala`
around lines 39 - 80, In the Parser.scala state-machine handlers (states STATE0,
STATE1, MINUS_NUM and the other state match blocks), change the fallback
branches that currently instantiate ExprSyntaxError (e.g. "new
ExprSyntaxError(...)") to actually throw the exception (throw new
ExprSyntaxError(...)); ensure every default/case _ branch in functions handling
tokens throws rather than returns/creates the error so malformed tokens (e.g.
trailing TRUE_LITERAL) cause parsing to fail.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/ExpressionEvaluator.scala`:
- Around line 40-99: ExpressionEvaluator retains prior variable bindings causing
stale values; add a public clearValues() method on ExpressionEvaluator that
empties vars, stringVars and nullVars (i.e., clear mutable.HashMap/HashSet) and
then call expr.clearValues() from RecordExtractors.canExtract() immediately
before setting variables for the current record so each evaluation starts with a
clean binding slate; keep existing setValue/setStringValue/setNullValue and
eval*/getVariables behavior unchanged.

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala`:
- Around line 607-610: The pattern that converts numeric rule variables using
v.toString.toInt is brittle; update the match in RecordExtractors (the v match
branch that currently calls expr.setStringValue(k, s) or expr.setValue(k,
v.toString.toInt)) to explicitly handle numeric types (e.g., case i: Int =>
expr.setValue(k, i); case l: Long => validate Long fits Int range then
expr.setValue(k, l.toInt); case s: Short|Byte => expr.setValue(k, s.toInt); case
bd: BigInt => check bounds then toInt; and explicitly handle
Float/Double/BigDecimal by either rejecting or converting with clear rounding
policy), and on string inputs parse safely with validation instead of blind
toInt, throwing or logging a clear error when a value is out of Int range or not
an integer.
- Around line 599-613: The ExpressionEvaluator (expr) retains prior variable
values, so before setting current record variables in canExtract you must
clear/reset the evaluator state to avoid stale values; call an available
reset/clear method on field.ruleExpression (e.g., expr.reset() or
expr.clearVariables()) or, if no such API exists, fetch the evaluator's current
variable names and call expr.setNullValue(name) for any name not present in the
incoming variables map, then set the incoming variables via expr.setNullValue /
setStringValue / setValue and finally call expr.evalBool().

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala`:
- Around line 784-802: getRedefineRuleExpressionMapping currently allows later
redefine rules to silently overwrite earlier ones; change it to detect
duplicates on the transformed redefine target and throw an
IllegalArgumentException when a duplicate target is found. Specifically, when
iterating params.getMap in getRedefineRuleExpressionMapping, after computing val
redefine = CopybookParser.transformIdentifier(splitVal(0).trim) check whether
that redefine is already present in the accumulating result (or a local HashSet
of seen targets); if it is, throw IllegalArgumentException describing the
conflicting redefine target and the original option values; otherwise mark the
target as seen and add the (redefine -> rule) pair. Ensure this behavior applies
for keys starting with PARAM_REDEFINE_RULE_PREFIX or
PARAM_REDEFINE_RULE_PREFIX_ALT and still calls params.markUsed(k).

In `@README.md`:
- Around line 1033-1036: Update the README fenced code blocks to include
language identifiers and fix the heading level: add ```scala before the
`.option("redefine_rule:1", "COMPANY => RECORD_TYPE = 'C'")` /
`.option("redefine_rule:2", "PERSON => in(RECORD_TYPE, 'P', 'E')")` block,
change the df example block to ```text (the block starting with `df.show(10)`),
and change the `#### Notes` heading to `### Notes`; apply the same fixes to the
other similar occurrences (the other `.option` blocks and df.show block noted in
the comment).
- Line 1038: Fix the two user-facing typos in the README example: change the
word "lok" to "look" in the sentence "For the above example the load options
will lok like this" and correct the mysterious "RECODD..." token to the intended
"RECORD" (or the correct full identifier used elsewhere) wherever it appears
(including the other occurrence around the mentioned line ~1070); update both
example occurrences so copy/paste guidance reads correctly and consistently.

---

Outside diff comments:
In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/Parser.scala`:
- Around line 39-80: In the Parser.scala state-machine handlers (states STATE0,
STATE1, MINUS_NUM and the other state match blocks), change the fallback
branches that currently instantiate ExprSyntaxError (e.g. "new
ExprSyntaxError(...)") to actually throw the exception (throw new
ExprSyntaxError(...)); ensure every default/case _ branch in functions handling
tokens throws rather than returns/creates the error so malformed tokens (e.g.
trailing TRUE_LITERAL) cause parsing to fail.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 81ca08a9-7e73-4e74-9e58-a6bee8b1e4bf

📥 Commits

Reviewing files that changed from the base of the PR and between 8750ebb and 41b9590.

📒 Files selected for processing (30)
  • README.md
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/CopybookParser.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Group.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Statement.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/RuleExpressionSetter.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/exceptions/RuleExpressionParsingException.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/ExpressionEvaluator.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/lexer/Lexer.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/lexer/Token.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/ExpressionBuilder.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/ExpressionBuilderImpl.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/ExtractVariablesBuilder.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/NumExprBuilderImpl.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/Parser.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedWithRecordLengthExprRawRecordExtractor.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/iterator/RecordLengthExpression.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/iterator/RecordLengthField.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/schema/CobolSchema.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/validator/ReaderParametersValidator.scala
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/expression/ExpressionEvaluatorSuite.scala
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/asttransform/RuleExpressionSetterSuite.scala
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/extract/BinaryExtractorSpec.scala
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/ParametersParsingSpec.scala
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test42RedefineRulesSpec.scala
💤 Files with no reviewable changes (1)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/NumExprBuilderImpl.scala

Comment thread README.md Outdated
Comment thread README.md Outdated
…, and support decimal values in rule expressions (Thanks @coderabbitai).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test42RedefineRulesSpec.scala (1)

269-271: ⚡ Quick win

Use decimal literals in the decimal-rule test to validate the actual feature.

Line 269-Line 271 currently use integer literals, so this case doesn’t strictly prove decimal literal handling in rule expressions.

Proposed test adjustment
-          .option("redefine-rule:1", "G1 => in(ID, 1, null)")
-          .option("redefine-rule:2", "G2 => ID=2 && true")
-          .option("redefine-rule:3", "G3 => ID = 3 || ID = null")
+          .option("redefine-rule:1", "G1 => in(ID, 1.1, null)")
+          .option("redefine-rule:2", "G2 => ID = 2.2 && true")
+          .option("redefine-rule:3", "G3 => ID = 3.0 || ID = null")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test42RedefineRulesSpec.scala`
around lines 269 - 271, The test currently uses integer literals for
redefine-rule options (redefine-rule:1, redefine-rule:2, redefine-rule:3) which
doesn't validate decimal-literal handling; update the rule expressions used in
Test42RedefineRulesSpec (the .option calls for "redefine-rule:1",
"redefine-rule:2", "redefine-rule:3") to use decimal literals (e.g., 1.0, 2.0,
3.0 or a preferred decimal format) in place of the integer literals so the
decimal-rule behavior is actually exercised while preserving the same logical
checks (in(...), equality and OR conditions).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test42RedefineRulesSpec.scala`:
- Around line 269-271: The test currently uses integer literals for
redefine-rule options (redefine-rule:1, redefine-rule:2, redefine-rule:3) which
doesn't validate decimal-literal handling; update the rule expressions used in
Test42RedefineRulesSpec (the .option calls for "redefine-rule:1",
"redefine-rule:2", "redefine-rule:3") to use decimal literals (e.g., 1.0, 2.0,
3.0 or a preferred decimal format) in place of the integer literals so the
decimal-rule behavior is actually exercised while preserving the same logical
checks (in(...), equality and OR conditions).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 11f28eb2-74bc-4f6a-894c-ac5822d803c3

📥 Commits

Reviewing files that changed from the base of the PR and between 41b9590 and 05ad323.

📒 Files selected for processing (7)
  • README.md
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/ExpressionEvaluator.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/Parser.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/ParametersParsingSpec.scala
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test42RedefineRulesSpec.scala
🚧 Files skipped from review as they are similar to previous changes (4)
  • README.md
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/ParametersParsingSpec.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/ExpressionEvaluator.scala

@yruslan yruslan merged commit 5ed6140 into master May 28, 2026
7 checks passed
@yruslan yruslan deleted the feature/850-add-rule-based-redefines-check branch May 28, 2026 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add expression-based rules for REDEFINES

1 participant