#850 Add the ability to define rules when redefined fields are valid#851
Conversation
…ased expressions attached to schema fields.
…stinguisher records.
… operators, and in()/if() functions in expression evaluator.
… rules feature.
WalkthroughAdds an expression grammar and evaluator, wires expression evaluators from reader parameters into copybook parsing, annotates AST fields/groups with rule expressions, and gates record extraction so REDEFINES alternatives are selected by evaluated boolean rules. ChangesREDEFINES Rule Expression Framework
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
JaCoCo code coverage report - 'cobol-parser'
|
JaCoCo code coverage report - 'spark-cobol'
|
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/Parser.scala (1)
39-80:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winActually throw
ExprSyntaxErrorin the fallback branches.
STATE0,STATE1, andMINUS_NUMcurrently instantiateExprSyntaxErrorbut never throw it. That lets malformed input be partially accepted; for example,new ExpressionEvaluator("1 true").evalInt()can return1because the trailingTRUE_LITERALfalls through the non-throwingSTATE1default. In this feature that means an invalid redefine rule can survive parsing and then gate extraction with the wrong predicate.🔧 Minimal fix
- case _ => new ExprSyntaxError(s"Unexpected '$token' at pos ${token.pos}") + case _ => throw new ExprSyntaxError(s"Unexpected '$token' at pos ${token.pos}") ... - case _ => new ExprSyntaxError(s"Unexpected '$token' at pos ${token.pos}") + case _ => throw new ExprSyntaxError(s"Unexpected '$token' at pos ${token.pos}") ... - case _ => new ExprSyntaxError(s"Unexpected '$token' at pos ${token.pos}") + case _ => throw new ExprSyntaxError(s"Unexpected '$token' at pos ${token.pos}")Also applies to: 82-154
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/Parser.scala` around lines 39 - 80, In the Parser.scala state-machine handlers (states STATE0, STATE1, MINUS_NUM and the other state match blocks), change the fallback branches that currently instantiate ExprSyntaxError (e.g. "new ExprSyntaxError(...)") to actually throw the exception (throw new ExprSyntaxError(...)); ensure every default/case _ branch in functions handling tokens throws rather than returns/creates the error so malformed tokens (e.g. trailing TRUE_LITERAL) cause parsing to fail.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/ExpressionEvaluator.scala`:
- Around line 40-99: ExpressionEvaluator retains prior variable bindings causing
stale values; add a public clearValues() method on ExpressionEvaluator that
empties vars, stringVars and nullVars (i.e., clear mutable.HashMap/HashSet) and
then call expr.clearValues() from RecordExtractors.canExtract() immediately
before setting variables for the current record so each evaluation starts with a
clean binding slate; keep existing setValue/setStringValue/setNullValue and
eval*/getVariables behavior unchanged.
In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala`:
- Around line 607-610: The pattern that converts numeric rule variables using
v.toString.toInt is brittle; update the match in RecordExtractors (the v match
branch that currently calls expr.setStringValue(k, s) or expr.setValue(k,
v.toString.toInt)) to explicitly handle numeric types (e.g., case i: Int =>
expr.setValue(k, i); case l: Long => validate Long fits Int range then
expr.setValue(k, l.toInt); case s: Short|Byte => expr.setValue(k, s.toInt); case
bd: BigInt => check bounds then toInt; and explicitly handle
Float/Double/BigDecimal by either rejecting or converting with clear rounding
policy), and on string inputs parse safely with validation instead of blind
toInt, throwing or logging a clear error when a value is out of Int range or not
an integer.
- Around line 599-613: The ExpressionEvaluator (expr) retains prior variable
values, so before setting current record variables in canExtract you must
clear/reset the evaluator state to avoid stale values; call an available
reset/clear method on field.ruleExpression (e.g., expr.reset() or
expr.clearVariables()) or, if no such API exists, fetch the evaluator's current
variable names and call expr.setNullValue(name) for any name not present in the
incoming variables map, then set the incoming variables via expr.setNullValue /
setStringValue / setValue and finally call expr.evalBool().
In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scala`:
- Around line 784-802: getRedefineRuleExpressionMapping currently allows later
redefine rules to silently overwrite earlier ones; change it to detect
duplicates on the transformed redefine target and throw an
IllegalArgumentException when a duplicate target is found. Specifically, when
iterating params.getMap in getRedefineRuleExpressionMapping, after computing val
redefine = CopybookParser.transformIdentifier(splitVal(0).trim) check whether
that redefine is already present in the accumulating result (or a local HashSet
of seen targets); if it is, throw IllegalArgumentException describing the
conflicting redefine target and the original option values; otherwise mark the
target as seen and add the (redefine -> rule) pair. Ensure this behavior applies
for keys starting with PARAM_REDEFINE_RULE_PREFIX or
PARAM_REDEFINE_RULE_PREFIX_ALT and still calls params.markUsed(k).
In `@README.md`:
- Around line 1033-1036: Update the README fenced code blocks to include
language identifiers and fix the heading level: add ```scala before the
`.option("redefine_rule:1", "COMPANY => RECORD_TYPE = 'C'")` /
`.option("redefine_rule:2", "PERSON => in(RECORD_TYPE, 'P', 'E')")` block,
change the df example block to ```text (the block starting with `df.show(10)`),
and change the `#### Notes` heading to `### Notes`; apply the same fixes to the
other similar occurrences (the other `.option` blocks and df.show block noted in
the comment).
- Line 1038: Fix the two user-facing typos in the README example: change the
word "lok" to "look" in the sentence "For the above example the load options
will lok like this" and correct the mysterious "RECODD..." token to the intended
"RECORD" (or the correct full identifier used elsewhere) wherever it appears
(including the other occurrence around the mentioned line ~1070); update both
example occurrences so copy/paste guidance reads correctly and consistently.
---
Outside diff comments:
In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/Parser.scala`:
- Around line 39-80: In the Parser.scala state-machine handlers (states STATE0,
STATE1, MINUS_NUM and the other state match blocks), change the fallback
branches that currently instantiate ExprSyntaxError (e.g. "new
ExprSyntaxError(...)") to actually throw the exception (throw new
ExprSyntaxError(...)); ensure every default/case _ branch in functions handling
tokens throws rather than returns/creates the error so malformed tokens (e.g.
trailing TRUE_LITERAL) cause parsing to fail.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 81ca08a9-7e73-4e74-9e58-a6bee8b1e4bf
📒 Files selected for processing (30)
README.mdcobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/CopybookParser.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/antlr/ParserVisitor.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Group.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Primitive.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/Statement.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/asttransform/RuleExpressionSetter.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/exceptions/RuleExpressionParsingException.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/ExpressionEvaluator.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/lexer/Lexer.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/lexer/Token.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/ExpressionBuilder.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/ExpressionBuilderImpl.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/ExtractVariablesBuilder.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/NumExprBuilderImpl.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/Parser.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/raw/FixedWithRecordLengthExprRawRecordExtractor.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/iterator/RecordLengthExpression.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/iterator/RecordLengthField.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParameters.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/ReaderParameters.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/schema/CobolSchema.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/validator/ReaderParametersValidator.scalacobol-parser/src/test/scala/za/co/absa/cobrix/cobol/expression/ExpressionEvaluatorSuite.scalacobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/asttransform/RuleExpressionSetterSuite.scalacobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/extract/BinaryExtractorSpec.scalaspark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/ParametersParsingSpec.scalaspark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test42RedefineRulesSpec.scala
💤 Files with no reviewable changes (1)
- cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/NumExprBuilderImpl.scala
…, and support decimal values in rule expressions (Thanks @coderabbitai).
There was a problem hiding this comment.
🧹 Nitpick comments (1)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test42RedefineRulesSpec.scala (1)
269-271: ⚡ Quick winUse decimal literals in the decimal-rule test to validate the actual feature.
Line 269-Line 271 currently use integer literals, so this case doesn’t strictly prove decimal literal handling in rule expressions.
Proposed test adjustment
- .option("redefine-rule:1", "G1 => in(ID, 1, null)") - .option("redefine-rule:2", "G2 => ID=2 && true") - .option("redefine-rule:3", "G3 => ID = 3 || ID = null") + .option("redefine-rule:1", "G1 => in(ID, 1.1, null)") + .option("redefine-rule:2", "G2 => ID = 2.2 && true") + .option("redefine-rule:3", "G3 => ID = 3.0 || ID = null")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test42RedefineRulesSpec.scala` around lines 269 - 271, The test currently uses integer literals for redefine-rule options (redefine-rule:1, redefine-rule:2, redefine-rule:3) which doesn't validate decimal-literal handling; update the rule expressions used in Test42RedefineRulesSpec (the .option calls for "redefine-rule:1", "redefine-rule:2", "redefine-rule:3") to use decimal literals (e.g., 1.0, 2.0, 3.0 or a preferred decimal format) in place of the integer literals so the decimal-rule behavior is actually exercised while preserving the same logical checks (in(...), equality and OR conditions).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test42RedefineRulesSpec.scala`:
- Around line 269-271: The test currently uses integer literals for
redefine-rule options (redefine-rule:1, redefine-rule:2, redefine-rule:3) which
doesn't validate decimal-literal handling; update the rule expressions used in
Test42RedefineRulesSpec (the .option calls for "redefine-rule:1",
"redefine-rule:2", "redefine-rule:3") to use decimal literals (e.g., 1.0, 2.0,
3.0 or a preferred decimal format) in place of the integer literals so the
decimal-rule behavior is actually exercised while preserving the same logical
checks (in(...), equality and OR conditions).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 11f28eb2-74bc-4f6a-894c-ac5822d803c3
📒 Files selected for processing (7)
README.mdcobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/ExpressionEvaluator.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/parser/Parser.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scalacobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/parameters/CobolParametersParser.scalaspark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/ParametersParsingSpec.scalaspark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/integration/Test42RedefineRulesSpec.scala
🚧 Files skipped from review as they are similar to previous changes (4)
- README.md
- spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/ParametersParsingSpec.scala
- cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/extractors/record/RecordExtractors.scala
- cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/expression/ExpressionEvaluator.scala
Closes #850
Summary by CodeRabbit
New Features
Documentation