fix(eval): read and write eval data files as utf-8#6298
Open
anxkhn wants to merge 1 commit into
Open
Conversation
Commit fc85348 standardized the eval subsystem on utf-8 for file based reads and writes, but a few eval data file operations were missed and still relied on the platform default encoding: - agent_evaluator.load_json (reads eval datasets) - agent_evaluator.migrate_eval_data_to_new_schema (writes the new file) - agent_evaluator._get_initial_session (reads the initial session file) - evaluation_generator.generate_responses_from_session (reads a session) On platforms whose default encoding is not utf-8 (for example cp1252 on Windows), loading or writing eval datasets or sessions that contain non-ASCII characters (emoji, CJK, accents) raises UnicodeDecodeError or UnicodeEncodeError, and the write path was inconsistent with the utf-8 reads elsewhere in the same module (agent_evaluator.py already opens the eval set file with encoding="utf-8"). Add encoding="utf-8" to these four open() calls so all eval data I/O is utf-8 consistent, and add regression tests that force a non-utf-8 default encoding to confirm non-ASCII eval data and sessions round-trip.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Please ensure you have read the contribution guide before creating a pull request.
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
2. Or, if no issue exists, describe the change:
Problem:
The eval subsystem standardized on UTF-8 for file based reads and writes in
commit
fc85348f("fix: add utf-8 encoding to file based reads and writes ofeval data"), which added
encoding="utf-8"to theopen()calls inlocal_eval_set_results_manager.pyandlocal_eval_sets_manager.py. A fewsibling eval data file operations were missed and still open text (JSON) files
without an explicit encoding, so they fall back to the platform default
(
locale.getpreferredencoding(), e.g. cp1252 on Windows):agent_evaluator.load_json(reads eval datasets)agent_evaluator.migrate_eval_data_to_new_schema(writes the new-schema file)agent_evaluator._get_initial_session(reads the initial session file)evaluation_generator.generate_responses_from_session(reads a session)On a host whose default encoding is not UTF-8, loading or writing eval datasets
or sessions that contain non-ASCII characters (emoji, CJK, accents) raises
UnicodeDecodeError/UnicodeEncodeError. The write inmigrate_eval_data_to_new_schemawas also inconsistent with the UTF-8 readselsewhere in the same module:
agent_evaluator.pyalready opens the eval setfile with
encoding="utf-8"(theEvalSetload path), so the same file's datacould be written in one encoding and read back in another.
Solution:
Add
encoding="utf-8"to those fouropen()calls so all eval data I/O isUTF-8 consistent, matching
fc85348fand the existingencoding="utf-8"openin the same module. This is the smallest change that removes the platform
dependency; it does not alter behavior on hosts that already default to UTF-8.
Testing Plan
Unit Tests:
Added four regression tests. They cannot rely on this machine's default
encoding already being UTF-8, so each test patches the module-level
opennameto fall back to ASCII when a text-mode open omits
encoding=, faithfullyreproducing a non-UTF-8 host (cp1252/Windows). Files are written with raw
non-ASCII characters (emoji + CJK + accented), as the eval subsystem's Pydantic
model_dump_jsondoes.tests/unittests/evaluation/test_agent_evaluator.py(new file, 3 tests):load_json,_get_initial_session, andmigrate_eval_data_to_new_schema(the migration test exercises both the read and the write path).
tests/unittests/evaluation/test_evaluation_generator.py(1 added test):generate_responses_from_session.Pre-fix these tests fail with
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0(the emoji byte); post-fix they pass, confirming the tests fail forthe right reason and the encoding argument is what fixes them.
Manual End-to-End (E2E) Tests:
Not applicable: this is an internal file-encoding fix with no UI or runner
surface. The behavior change is host-encoding-dependent and is covered by the
unit tests above, which simulate a non-UTF-8 host on any platform. To reproduce
the original failure manually on a non-UTF-8 host (or with
PYTHONUTF8=0/PYTHONIOENCODINGforcing a legacy encoding): callAgentEvaluator.evaluate(...)with an eval dataset JSON that contains an emojior CJK text; pre-fix it raises
UnicodeDecodeError, post-fix it loads.Checklist
Additional context
This continues the already-accepted pattern from
fc85348f; after this changeall ten
open()calls insrc/google/adk/evaluation/are UTF-8 consistent.