Skip to content

Fix XPath normalize-space function#324

Merged
naitoh merged 1 commit into
ruby:masterfrom
tompng:fix_normalize_spaces
Jun 8, 2026
Merged

Fix XPath normalize-space function#324
naitoh merged 1 commit into
ruby:masterfrom
tompng:fix_normalize_spaces

Conversation

@tompng

@tompng tompng commented Jun 5, 2026

Copy link
Copy Markdown
Member

It should return normalized string of the first node, just like other functions such as string() and number()

doc = Nokogiri::XML.parse(<<~XML)
  <a><b>breakfast     boosts\t\t
  
  concentration   </b><c>
  Coffee beans
  aroma
  </c><d>        Dessert
  \t\t    after    dinner</d></a>
XML
doc.xpath("normalize-space(//text())")
#=> "breakfast boosts concentration"

This bug is a blocker of implementing delayed nodeset ordering.

def match(path_stack, node)
  nodeset = [XPathNode.new(node, position: 1)]
  result = expr(path_stack, nodeset)
  case result
  when Array # nodeset ← HERE, normalize-space function returned array of string
    unnode(result).uniq # Changing to sort(result) failed
  else
    [result]
  end
end

Copilot AI review requested due to automatic review settings June 5, 2026 13:09

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR changes how normalize-space() behaves in REXML’s XPath functions and updates the associated test expectation to match the new behavior.

Changes:

  • Simplifies Functions::normalize_space to always coerce via string() and return a single normalized string.
  • Updates the test_normalize_space_strings assertion to only expect one normalized value.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
test/functions/test_base.rb Updates expected result for normalize-space(//text()) to a single string.
lib/rexml/functions.rb Replaces array-aware normalization logic with a single-string implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/rexml/functions.rb Outdated
else
string.to_s.strip.gsub(/\s+/um, ' ')
end
string(string).strip.gsub(/\s+/um, ' ')
Comment thread test/functions/test_base.rb Outdated
"Dessert after dinner",
],
normalized_texts)
assert_equal(["breakfast boosts concentration"], normalized_texts)
Comment thread test/functions/test_base.rb Outdated
"Dessert after dinner",
],
normalized_texts)
normalized_text = REXML::XPath.each(REXML::Document.new(source), "normalize-space(//text())").to_a

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalized_text has not been tested.

Comment thread lib/rexml/functions.rb Outdated
@@ -263,12 +263,7 @@ def Functions::string_length( string )
end

def Functions::normalize_space( string=nil )

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is called without any arguments, as in normalize-space(), this causes a problem.

  • master
require "nokogiri"
require "rexml"

xml = <<-XML
        <root>
          <item>  apple  </item>
          <item>  orange  </item>
          <item>  banana  </item>
        </root>
      XML
puts Nokogiri::XML.parse(xml).xpath("//item/text()[normalize-space()='orange']")
#=> "  orange  "
puts REXML::XPath.first(doc, "//item/text()[normalize-space()='orange']")
#=> "  orange  "
  • this PR
puts REXML::XPath.first(doc, "//item/text()[normalize-space()='orange']")
#=> ""

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Fixed and test added.

It should return normalized string of the first node, just like other functions such as `string()` and `number()`
@tompng tompng force-pushed the fix_normalize_spaces branch from 5df57c1 to 6bdcd64 Compare June 8, 2026 10:42
Copilot AI review requested due to automatic review settings June 8, 2026 10:42

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread lib/rexml/functions.rb
Comment on lines +265 to +266
def Functions::normalize_space( object=@@context[:node] )
string(object).strip.gsub(/\s+/um, ' ')
Comment thread lib/rexml/functions.rb
Comment on lines +265 to +266
def Functions::normalize_space( object=@@context[:node] )
string(object).strip.gsub(/\s+/um, ' ')

@naitoh naitoh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@naitoh naitoh merged commit 311586c into ruby:master Jun 8, 2026
72 of 73 checks passed
@tompng tompng deleted the fix_normalize_spaces branch June 8, 2026 12:37
naitoh pushed a commit that referenced this pull request Jun 9, 2026
Similar to #324
Values should be nodeset(Array) or primitive, not `[primitive]`

Also removes these workarounds related to primitive wrapped in array
```ruby
# Workaround to handle single primitive value wrapped in an array
result = result[0] if result.kind_of? Array and result.length == 1

# Workaround to handle multiple primitive values wrapped in an array
# Arrays are always nodeset, so `if result.size > 0` is enough
if result.size > 0 and result.inject(false) {|k,s| s or k}
```

It works even wrapping with an array, but leaving it will be a blocker
of implementing delayed nodeset ordering.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants