Skip to content

fix(eval): read and write eval data files as utf-8#6298

Open
anxkhn wants to merge 1 commit into
google:mainfrom
anxkhn:fix/eval-open-utf8-encoding
Open

fix(eval): read and write eval data files as utf-8#6298
anxkhn wants to merge 1 commit into
google:mainfrom
anxkhn:fix/eval-open-utf8-encoding

Conversation

@anxkhn

@anxkhn anxkhn commented Jul 5, 2026

Copy link
Copy Markdown

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

  • N/A (no existing issue; change described below per the issue templates)

2. Or, if no issue exists, describe the change:

Problem:

The eval subsystem standardized on UTF-8 for file based reads and writes in
commit fc85348f ("fix: add utf-8 encoding to file based reads and writes of
eval data"), which added encoding="utf-8" to the open() calls in
local_eval_set_results_manager.py and local_eval_sets_manager.py. A few
sibling eval data file operations were missed and still open text (JSON) files
without an explicit encoding, so they fall back to the platform default
(locale.getpreferredencoding(), e.g. cp1252 on Windows):

  • agent_evaluator.load_json (reads eval datasets)
  • agent_evaluator.migrate_eval_data_to_new_schema (writes the new-schema file)
  • agent_evaluator._get_initial_session (reads the initial session file)
  • evaluation_generator.generate_responses_from_session (reads a session)

On a host whose default encoding is not UTF-8, loading or writing eval datasets
or sessions that contain non-ASCII characters (emoji, CJK, accents) raises
UnicodeDecodeError / UnicodeEncodeError. The write in
migrate_eval_data_to_new_schema was also inconsistent with the UTF-8 reads
elsewhere in the same module: agent_evaluator.py already opens the eval set
file with encoding="utf-8" (the EvalSet load path), so the same file's data
could be written in one encoding and read back in another.

Solution:

Add encoding="utf-8" to those four open() calls so all eval data I/O is
UTF-8 consistent, matching fc85348f and the existing encoding="utf-8" open
in the same module. This is the smallest change that removes the platform
dependency; it does not alter behavior on hosts that already default to UTF-8.

 def load_json(file_path: str) -> Union[Dict, List]:
-  with open(file_path, "r") as f:
+  with open(file_path, "r", encoding="utf-8") as f:
     return json.load(f)

-    with open(new_eval_data_file, "w") as f:
+    with open(new_eval_data_file, "w", encoding="utf-8") as f:
       f.write(eval_set.model_dump_json(indent=2))

-      with open(initial_session_file, "r") as f:
+      with open(initial_session_file, "r", encoding="utf-8") as f:
         initial_session = json.loads(f.read())

-    with open(session_path, "r") as f:
+    with open(session_path, "r", encoding="utf-8") as f:
       session_data = Session.model_validate_json(f.read())

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

Added four regression tests. They cannot rely on this machine's default
encoding already being UTF-8, so each test patches the module-level open name
to fall back to ASCII when a text-mode open omits encoding=, faithfully
reproducing a non-UTF-8 host (cp1252/Windows). Files are written with raw
non-ASCII characters (emoji + CJK + accented), as the eval subsystem's Pydantic
model_dump_json does.

  • tests/unittests/evaluation/test_agent_evaluator.py (new file, 3 tests):
    load_json, _get_initial_session, and migrate_eval_data_to_new_schema
    (the migration test exercises both the read and the write path).
  • tests/unittests/evaluation/test_evaluation_generator.py (1 added test):
    generate_responses_from_session.

Pre-fix these tests fail with UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 (the emoji byte); post-fix they pass, confirming the tests fail for
the right reason and the encoding argument is what fixes them.

$ pytest tests/unittests/evaluation/test_agent_evaluator.py \
         tests/unittests/evaluation/test_evaluation_generator.py
24 passed, 4 warnings

$ pytest tests/unittests/evaluation
488 passed, 25 warnings

Manual End-to-End (E2E) Tests:

Not applicable: this is an internal file-encoding fix with no UI or runner
surface. The behavior change is host-encoding-dependent and is covered by the
unit tests above, which simulate a non-UTF-8 host on any platform. To reproduce
the original failure manually on a non-UTF-8 host (or with
PYTHONUTF8=0/PYTHONIOENCODING forcing a legacy encoding): call
AgentEvaluator.evaluate(...) with an eval dataset JSON that contains an emoji
or CJK text; pre-fix it raises UnicodeDecodeError, post-fix it loads.

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end. (N/A - internal encoding fix; covered by host-simulating unit tests)
  • Any dependent changes have been merged and published in downstream modules. (none)

Additional context

This continues the already-accepted pattern from fc85348f; after this change
all ten open() calls in src/google/adk/evaluation/ are UTF-8 consistent.

Commit fc85348 standardized the eval subsystem on utf-8 for file based
reads and writes, but a few eval data file operations were missed and
still relied on the platform default encoding:

- agent_evaluator.load_json (reads eval datasets)
- agent_evaluator.migrate_eval_data_to_new_schema (writes the new file)
- agent_evaluator._get_initial_session (reads the initial session file)
- evaluation_generator.generate_responses_from_session (reads a session)

On platforms whose default encoding is not utf-8 (for example cp1252 on
Windows), loading or writing eval datasets or sessions that contain
non-ASCII characters (emoji, CJK, accents) raises UnicodeDecodeError or
UnicodeEncodeError, and the write path was inconsistent with the utf-8
reads elsewhere in the same module (agent_evaluator.py already opens the
eval set file with encoding="utf-8").

Add encoding="utf-8" to these four open() calls so all eval data I/O is
utf-8 consistent, and add regression tests that force a non-utf-8 default
encoding to confirm non-ASCII eval data and sessions round-trip.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant