Using UTF-8 by default is easier on new programmers So changing the default text file encoding based on the active code pageĬonsistent default text encoding will make Python behavior more expectableĪnd easier to learn. So python.exeĮxecuted from the legacy console and from the WSL cannot read text filesīut many Windows users don’t understand which code page is active. Microsoft is using UTF-8 and cp65001 more widely in recent versions ofįor example, “Command Prompt” uses the legacy code page by default.īut the Windows Subsystem for Linux (WSL) changes the active code page toĦ5001, and python.exe can be executed from the WSL.
Some tools on Windows change the active code page to 65001 (UTF-8), and
Other non-ASCII character in the README.md file, many Windows usersĬannot install the package due to a UnicodeDecodeError. Package authors using macOS or Linux may forget that the default encodingįor example, long_description = open("README.md").read() in
Motivation People assume it is always UTF-8 This PEP proposes changing the default text encoding to “UTF-8” (hereinafter called “locale encoding”) when encoding is not specified. (If we release 3.9 in 2020, this PEP will applied to 3.10, although deprecation warning is raised from 3.8)Ĭurrently, TextIOWrapper uses locale.getpreferredencoding(False)
I believe 2021 is not too early for this change. I propose to change Python’s default text encoding too, from 2021. Microsoft changed default text encoding of notepad.exe to UTF-8 from 2019 May Update!