Skip to content

Resolve Issue #4229: "Author not detected in C++ file".#4983

Open
AdityaOP007 wants to merge 1 commit intoaboutcode-org:developfrom
AdityaOP007:fix/Author-detection-in-C++-file
Open

Resolve Issue #4229: "Author not detected in C++ file".#4983
AdityaOP007 wants to merge 1 commit intoaboutcode-org:developfrom
AdityaOP007:fix/Author-detection-in-C++-file

Conversation

@AdityaOP007
Copy link
Copy Markdown

Fix author detection when "Author:" tag lacks trailing space (Issue #4229)
[
Input-
from cluecode.copyrights import detect_copyrights_from_lines
lines = [
(1, '// Date:9 April,2012'),
(2, '// Author:Frankie.Chu'),
(3, '// IDE Arduino-1.0')
]
results = list(detect_copyrights_from_lines(lines))
for r in results:
print(f'- {r.author} (Found on line {r.start_line})')

output-
Input Lines:
[(1, '// Date:9 April,2012'), (2, '// Author:Frankie.Chu'), (3, '// IDE Arduino-1.0')]
Detected Authors:

  • Frankie.Chu (Found on line 2)
    ]

Changes made by Aditya regarding to Author detection (Issue #4229):

  1. Token Normalization (src/cluecode/copyrights.py):

    • Added a normalize_author_colon regex hook in the prepare_text_line pipeline to identify "Author:" or "author:" tags that are not immediately followed by whitespace.
    • Automatically injects a space to ensure Author:Name becomes Author: Name, allowing the tokenization engine to correctly isolate the author keyword from the name.
  2. Lexer enhancements (src/cluecode/copyrights.py):

    • Added a new grammar lexer rule (r'^[A-Z][a-z]+\.[A-Z][a-z]+$', 'NAME').
    • Correctly identifies Name.Name handle structures (like Frankie.Chu) as proper NAME entities instead of generic nouns (NN), allowing the existing NLP author detection grammar to successfully extract them.
  3. Testing (tests/cluecode/test_copyrights_basic.py & tests/cluecode/data/authors/):

    • Added basic tokenization tests to verify prepare_text_line correctly normalizes spacing after colons without breaking existing correct formats.
    • Added a data-driven test (author_no_space_after_colon.cpp and its YAML) mimicking the exact reported edge-case to prevent future regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant