Skip to content

Commit 333d10c

Browse files
authored
bpo-43712 : fileinput: Add encoding parameter (GH-25272)
1 parent 133705b commit 333d10c

File tree

6 files changed

+119
-38
lines changed

6 files changed

+119
-38
lines changed

Doc/library/fileinput.rst

+26-9
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ write one file see :func:`open`.
1818
The typical use is::
1919

2020
import fileinput
21-
for line in fileinput.input():
21+
for line in fileinput.input(encoding="utf-8"):
2222
process(line)
2323

2424
This iterates over the lines of all files listed in ``sys.argv[1:]``, defaulting
@@ -49,13 +49,14 @@ a file may not have one.
4949
You can control how files are opened by providing an opening hook via the
5050
*openhook* parameter to :func:`fileinput.input` or :class:`FileInput()`. The
5151
hook must be a function that takes two arguments, *filename* and *mode*, and
52-
returns an accordingly opened file-like object. Two useful hooks are already
53-
provided by this module.
52+
returns an accordingly opened file-like object. If *encoding* and/or *errors*
53+
are specified, they will be passed to the hook as aditional keyword arguments.
54+
This module provides a :func:`hook_encoded` to support compressed files.
5455

5556
The following function is the primary interface of this module:
5657

5758

58-
.. function:: input(files=None, inplace=False, backup='', *, mode='r', openhook=None)
59+
.. function:: input(files=None, inplace=False, backup='', *, mode='r', openhook=None, encoding=None, errors=None)
5960

6061
Create an instance of the :class:`FileInput` class. The instance will be used
6162
as global state for the functions of this module, and is also returned to use
@@ -66,7 +67,7 @@ The following function is the primary interface of this module:
6667
:keyword:`with` statement. In this example, *input* is closed after the
6768
:keyword:`!with` statement is exited, even if an exception occurs::
6869

69-
with fileinput.input(files=('spam.txt', 'eggs.txt')) as f:
70+
with fileinput.input(files=('spam.txt', 'eggs.txt'), encoding="utf-8") as f:
7071
for line in f:
7172
process(line)
7273

@@ -76,6 +77,9 @@ The following function is the primary interface of this module:
7677
.. versionchanged:: 3.8
7778
The keyword parameters *mode* and *openhook* are now keyword-only.
7879

80+
.. versionchanged:: 3.10
81+
The keyword-only parameter *encoding* and *errors* are added.
82+
7983

8084
The following functions use the global state created by :func:`fileinput.input`;
8185
if there is no active state, :exc:`RuntimeError` is raised.
@@ -137,7 +141,7 @@ The class which implements the sequence behavior provided by the module is
137141
available for subclassing as well:
138142

139143

140-
.. class:: FileInput(files=None, inplace=False, backup='', *, mode='r', openhook=None)
144+
.. class:: FileInput(files=None, inplace=False, backup='', *, mode='r', openhook=None, encoding=None, errors=None)
141145

142146
Class :class:`FileInput` is the implementation; its methods :meth:`filename`,
143147
:meth:`fileno`, :meth:`lineno`, :meth:`filelineno`, :meth:`isfirstline`,
@@ -155,14 +159,15 @@ available for subclassing as well:
155159
*filename* and *mode*, and returns an accordingly opened file-like object. You
156160
cannot use *inplace* and *openhook* together.
157161

162+
You can specify *encoding* and *errors* that is passed to :func:`open` or *openhook*.
163+
158164
A :class:`FileInput` instance can be used as a context manager in the
159165
:keyword:`with` statement. In this example, *input* is closed after the
160166
:keyword:`!with` statement is exited, even if an exception occurs::
161167

162168
with FileInput(files=('spam.txt', 'eggs.txt')) as input:
163169
process(input)
164170

165-
166171
.. versionchanged:: 3.2
167172
Can be used as a context manager.
168173

@@ -175,6 +180,8 @@ available for subclassing as well:
175180
.. versionchanged:: 3.8
176181
The keyword parameter *mode* and *openhook* are now keyword-only.
177182

183+
.. versionchanged:: 3.10
184+
The keyword-only parameter *encoding* and *errors* are added.
178185

179186

180187
**Optional in-place filtering:** if the keyword argument ``inplace=True`` is
@@ -191,14 +198,20 @@ when standard input is read.
191198

192199
The two following opening hooks are provided by this module:
193200

194-
.. function:: hook_compressed(filename, mode)
201+
.. function:: hook_compressed(filename, mode, *, encoding=None, errors=None)
195202

196203
Transparently opens files compressed with gzip and bzip2 (recognized by the
197204
extensions ``'.gz'`` and ``'.bz2'``) using the :mod:`gzip` and :mod:`bz2`
198205
modules. If the filename extension is not ``'.gz'`` or ``'.bz2'``, the file is
199206
opened normally (ie, using :func:`open` without any decompression).
200207

201-
Usage example: ``fi = fileinput.FileInput(openhook=fileinput.hook_compressed)``
208+
The *encoding* and *errors* values are passed to to :class:`io.TextIOWrapper`
209+
for compressed files and open for normal files.
210+
211+
Usage example: ``fi = fileinput.FileInput(openhook=fileinput.hook_compressed, encoding="utf-8")``
212+
213+
.. versionchanged:: 3.10
214+
The keyword-only parameter *encoding* and *errors* are added.
202215

203216

204217
.. function:: hook_encoded(encoding, errors=None)
@@ -212,3 +225,7 @@ The two following opening hooks are provided by this module:
212225

213226
.. versionchanged:: 3.6
214227
Added the optional *errors* parameter.
228+
229+
.. deprecated:: 3.10
230+
This function is deprecated since :func:`input` and :class:`FileInput`
231+
now have *encoding* and *errors* parameters.

Doc/whatsnew/3.10.rst

+11
Original file line numberDiff line numberDiff line change
@@ -760,6 +760,17 @@ enum
760760
module constants have a :func:`repr` of ``module_name.member_name``.
761761
(Contributed by Ethan Furman in :issue:`40066`.)
762762
763+
fileinput
764+
---------
765+
766+
Added *encoding* and *errors* parameters in :func:`fileinput.input` and
767+
:class:`fileinput.FileInput`.
768+
(Contributed by Inada Naoki in :issue:`43712`.)
769+
770+
:func:`fileinput.hook_compressed` now returns :class:`TextIOWrapper` object
771+
when *mode* is "r" and file is compressed, like uncompressed files.
772+
(Contributed by Inada Naoki in :issue:`5758`.)
773+
763774
gc
764775
--
765776

Lib/fileinput.py

+41-17
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Typical use is:
44
55
import fileinput
6-
for line in fileinput.input():
6+
for line in fileinput.input(encoding="utf-8"):
77
process(line)
88
99
This iterates over the lines of all files listed in sys.argv[1:],
@@ -63,15 +63,9 @@
6363
deleted when the output file is closed. In-place filtering is
6464
disabled when standard input is read. XXX The current implementation
6565
does not work for MS-DOS 8+3 filesystems.
66-
67-
XXX Possible additions:
68-
69-
- optional getopt argument processing
70-
- isatty()
71-
- read(), read(size), even readlines()
72-
7366
"""
7467

68+
import io
7569
import sys, os
7670
from types import GenericAlias
7771

@@ -81,7 +75,8 @@
8175

8276
_state = None
8377

84-
def input(files=None, inplace=False, backup="", *, mode="r", openhook=None):
78+
def input(files=None, inplace=False, backup="", *, mode="r", openhook=None,
79+
encoding=None, errors=None):
8580
"""Return an instance of the FileInput class, which can be iterated.
8681
8782
The parameters are passed to the constructor of the FileInput class.
@@ -91,7 +86,8 @@ def input(files=None, inplace=False, backup="", *, mode="r", openhook=None):
9186
global _state
9287
if _state and _state._file:
9388
raise RuntimeError("input() already active")
94-
_state = FileInput(files, inplace, backup, mode=mode, openhook=openhook)
89+
_state = FileInput(files, inplace, backup, mode=mode, openhook=openhook,
90+
encoding=encoding, errors=errors)
9591
return _state
9692

9793
def close():
@@ -186,7 +182,7 @@ class FileInput:
186182
"""
187183

188184
def __init__(self, files=None, inplace=False, backup="", *,
189-
mode="r", openhook=None):
185+
mode="r", openhook=None, encoding=None, errors=None):
190186
if isinstance(files, str):
191187
files = (files,)
192188
elif isinstance(files, os.PathLike):
@@ -209,6 +205,16 @@ def __init__(self, files=None, inplace=False, backup="", *,
209205
self._file = None
210206
self._isstdin = False
211207
self._backupfilename = None
208+
self._encoding = encoding
209+
self._errors = errors
210+
211+
# We can not use io.text_encoding() here because old openhook doesn't
212+
# take encoding parameter.
213+
if "b" not in mode and encoding is None and sys.flags.warn_default_encoding:
214+
import warnings
215+
warnings.warn("'encoding' argument not specified.",
216+
EncodingWarning, 2)
217+
212218
# restrict mode argument to reading modes
213219
if mode not in ('r', 'rU', 'U', 'rb'):
214220
raise ValueError("FileInput opening mode must be one of "
@@ -362,9 +368,20 @@ def _readline(self):
362368
else:
363369
# This may raise OSError
364370
if self._openhook:
365-
self._file = self._openhook(self._filename, self._mode)
371+
# Custom hooks made previous to Python 3.10 didn't have
372+
# encoding argument
373+
if self._encoding is None:
374+
self._file = self._openhook(self._filename, self._mode)
375+
else:
376+
self._file = self._openhook(
377+
self._filename, self._mode, encoding=self._encoding, errors=self._errors)
366378
else:
367-
self._file = open(self._filename, self._mode)
379+
# EncodingWarning is emitted in __init__() already
380+
if "b" not in self._mode:
381+
encoding = self._encoding or "locale"
382+
else:
383+
encoding = None
384+
self._file = open(self._filename, self._mode, encoding=encoding, errors=self._errors)
368385
self._readline = self._file.readline # hide FileInput._readline
369386
return self._readline()
370387

@@ -395,16 +412,23 @@ def isstdin(self):
395412
__class_getitem__ = classmethod(GenericAlias)
396413

397414

398-
def hook_compressed(filename, mode):
415+
def hook_compressed(filename, mode, *, encoding=None, errors=None):
416+
if encoding is None: # EncodingWarning is emitted in FileInput() already.
417+
encoding = "locale"
399418
ext = os.path.splitext(filename)[1]
400419
if ext == '.gz':
401420
import gzip
402-
return gzip.open(filename, mode)
421+
stream = gzip.open(filename, mode)
403422
elif ext == '.bz2':
404423
import bz2
405-
return bz2.BZ2File(filename, mode)
424+
stream = bz2.BZ2File(filename, mode)
406425
else:
407-
return open(filename, mode)
426+
return open(filename, mode, encoding=encoding, errors=errors)
427+
428+
# gzip and bz2 are binary mode by default.
429+
if "b" not in mode:
430+
stream = io.TextIOWrapper(stream, encoding=encoding, errors=errors)
431+
return stream
408432

409433

410434
def hook_encoded(encoding, errors=None):

Lib/test/test_fileinput.py

+38-12
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
Tests for fileinput module.
33
Nick Mathewson
44
'''
5+
import io
56
import os
67
import sys
78
import re
@@ -238,7 +239,7 @@ def test_opening_mode(self):
238239
# try opening in universal newline mode
239240
t1 = self.writeTmp(b"A\nB\r\nC\rD", mode="wb")
240241
with warnings_helper.check_warnings(('', DeprecationWarning)):
241-
fi = FileInput(files=t1, mode="U")
242+
fi = FileInput(files=t1, mode="U", encoding="utf-8")
242243
with warnings_helper.check_warnings(('', DeprecationWarning)):
243244
lines = list(fi)
244245
self.assertEqual(lines, ["A\n", "B\n", "C\n", "D"])
@@ -278,7 +279,7 @@ def test_file_opening_hook(self):
278279
class CustomOpenHook:
279280
def __init__(self):
280281
self.invoked = False
281-
def __call__(self, *args):
282+
def __call__(self, *args, **kargs):
282283
self.invoked = True
283284
return open(*args)
284285

@@ -334,6 +335,14 @@ def test_inplace_binary_write_mode(self):
334335
with open(temp_file, 'rb') as f:
335336
self.assertEqual(f.read(), b'New line.')
336337

338+
def test_file_hook_backward_compatibility(self):
339+
def old_hook(filename, mode):
340+
return io.StringIO("I used to receive only filename and mode")
341+
t = self.writeTmp("\n")
342+
with FileInput([t], openhook=old_hook) as fi:
343+
result = fi.readline()
344+
self.assertEqual(result, "I used to receive only filename and mode")
345+
337346
def test_context_manager(self):
338347
t1 = self.writeTmp("A\nB\nC")
339348
t2 = self.writeTmp("D\nE\nF")
@@ -529,12 +538,14 @@ class MockFileInput:
529538
"""A class that mocks out fileinput.FileInput for use during unit tests"""
530539

531540
def __init__(self, files=None, inplace=False, backup="", *,
532-
mode="r", openhook=None):
541+
mode="r", openhook=None, encoding=None, errors=None):
533542
self.files = files
534543
self.inplace = inplace
535544
self.backup = backup
536545
self.mode = mode
537546
self.openhook = openhook
547+
self.encoding = encoding
548+
self.errors = errors
538549
self._file = None
539550
self.invocation_counts = collections.defaultdict(lambda: 0)
540551
self.return_values = {}
@@ -637,10 +648,11 @@ def do_test_call_input(self):
637648
backup = object()
638649
mode = object()
639650
openhook = object()
651+
encoding = object()
640652

641653
# call fileinput.input() with different values for each argument
642654
result = fileinput.input(files=files, inplace=inplace, backup=backup,
643-
mode=mode, openhook=openhook)
655+
mode=mode, openhook=openhook, encoding=encoding)
644656

645657
# ensure fileinput._state was set to the returned object
646658
self.assertIs(result, fileinput._state, "fileinput._state")
@@ -863,11 +875,15 @@ def test_state_is_not_None(self):
863875
self.assertIs(fileinput._state, instance)
864876

865877
class InvocationRecorder:
878+
866879
def __init__(self):
867880
self.invocation_count = 0
881+
868882
def __call__(self, *args, **kwargs):
869883
self.invocation_count += 1
870884
self.last_invocation = (args, kwargs)
885+
return io.BytesIO(b'some bytes')
886+
871887

872888
class Test_hook_compressed(unittest.TestCase):
873889
"""Unit tests for fileinput.hook_compressed()"""
@@ -886,33 +902,43 @@ def test_gz_ext_fake(self):
886902
original_open = gzip.open
887903
gzip.open = self.fake_open
888904
try:
889-
result = fileinput.hook_compressed("test.gz", 3)
905+
result = fileinput.hook_compressed("test.gz", "3")
890906
finally:
891907
gzip.open = original_open
892908

893909
self.assertEqual(self.fake_open.invocation_count, 1)
894-
self.assertEqual(self.fake_open.last_invocation, (("test.gz", 3), {}))
910+
self.assertEqual(self.fake_open.last_invocation, (("test.gz", "3"), {}))
911+
912+
@unittest.skipUnless(gzip, "Requires gzip and zlib")
913+
def test_gz_with_encoding_fake(self):
914+
original_open = gzip.open
915+
gzip.open = lambda filename, mode: io.BytesIO(b'Ex-binary string')
916+
try:
917+
result = fileinput.hook_compressed("test.gz", "3", encoding="utf-8")
918+
finally:
919+
gzip.open = original_open
920+
self.assertEqual(list(result), ['Ex-binary string'])
895921

896922
@unittest.skipUnless(bz2, "Requires bz2")
897923
def test_bz2_ext_fake(self):
898924
original_open = bz2.BZ2File
899925
bz2.BZ2File = self.fake_open
900926
try:
901-
result = fileinput.hook_compressed("test.bz2", 4)
927+
result = fileinput.hook_compressed("test.bz2", "4")
902928
finally:
903929
bz2.BZ2File = original_open
904930

905931
self.assertEqual(self.fake_open.invocation_count, 1)
906-
self.assertEqual(self.fake_open.last_invocation, (("test.bz2", 4), {}))
932+
self.assertEqual(self.fake_open.last_invocation, (("test.bz2", "4"), {}))
907933

908934
def test_blah_ext(self):
909-
self.do_test_use_builtin_open("abcd.blah", 5)
935+
self.do_test_use_builtin_open("abcd.blah", "5")
910936

911937
def test_gz_ext_builtin(self):
912-
self.do_test_use_builtin_open("abcd.Gz", 6)
938+
self.do_test_use_builtin_open("abcd.Gz", "6")
913939

914940
def test_bz2_ext_builtin(self):
915-
self.do_test_use_builtin_open("abcd.Bz2", 7)
941+
self.do_test_use_builtin_open("abcd.Bz2", "7")
916942

917943
def do_test_use_builtin_open(self, filename, mode):
918944
original_open = self.replace_builtin_open(self.fake_open)
@@ -923,7 +949,7 @@ def do_test_use_builtin_open(self, filename, mode):
923949

924950
self.assertEqual(self.fake_open.invocation_count, 1)
925951
self.assertEqual(self.fake_open.last_invocation,
926-
((filename, mode), {}))
952+
((filename, mode), {'encoding': 'locale', 'errors': None}))
927953

928954
@staticmethod
929955
def replace_builtin_open(new_open_func):

0 commit comments

Comments
 (0)