Skip to content

Conversation

@mehmetbutgul
Copy link
Contributor

Description

Added sentence metadata (Map("sentence" -> "0")) to the DocumentAssembler output in LightPipeline.
This ensures that sentence information is consistently present in the annotations produced by LightPipeline.

Motivation and Context

Previously, transform() / fullAnnotate() and LightPipeline were producing different metadata outputs for documents.
By adding the default sentence metadata in LightPipeline, this change eliminates the inconsistency and guarantees identical metadata across Pipeline and LightPipeline executions.

How Has This Been Tested?

  • Added a dedicated Scala unit test covering this behavior
  • Verified that all existing LightPipeline tests pass successfully

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@mehmetbutgul mehmetbutgul changed the title 2025 12 22 sentence idx in light pipeline Addition sentence info into DocumentAssembler output in LightPipeline Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants