We publish our evaluation approach so readers can understand what we prioritize and how to verify claims.