Skip to content

Commit f7a8f6d

Browse files
Merge pull request #532 from mrbean-bremen/pandas-support
Added partial support for pandas to work with pyfakefs
2 parents d543219 + 073d019 commit f7a8f6d

10 files changed

+322
-69
lines changed

CHANGES.md

+6
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,13 @@ The released versions correspond to PyPi releases.
33

44
## Version 4.1.0 (as yet unreleased)
55

6+
#### New Features
7+
* Added some support for pandas (`read_csv`, `read_excel` and more), and
8+
for django file locks to work with the fake filesystem
9+
(see [#531](../../issues/531))
10+
611
#### Fixes
12+
* `os.expanduser` now works with a bytes path
713
* Do not override global warnings setting in `Deprecator`
814
(see [#526](../../issues/526))
915
* Make sure filesystem modules in `pathlib` are patched

docs/usage.rst

+21-6
Original file line numberDiff line numberDiff line change
@@ -273,7 +273,8 @@ names to be faked.
273273

274274
This mechanism is used in pyfakefs itself to patch the external modules
275275
`pathlib2` and `scandir` if present, and the following example shows how to
276-
fake a module in Django that uses OS file system functions:
276+
fake a module in Django that uses OS file system functions (note that this
277+
has now been been integrated into pyfakefs):
277278

278279
.. code:: python
279280
@@ -356,6 +357,17 @@ if the real user is a root user (e.g. has the user ID 0). If you want to run
356357
your tests as a non-root user regardless of the actual user rights, you may
357358
want to set this to ``False``.
358359

360+
use_known_patches
361+
~~~~~~~~~~~~~~~~~
362+
Some libraries are known to require patching in order to work with pyfakefs.
363+
If ``use_known_patches`` is set to ``True`` (the default), pyfakefs patches
364+
these libraries so that they will work with the fake filesystem. Currently, this
365+
includes patches for ``pandas`` read methods like ``read_csv`` and
366+
``read_excel``--more may follow. Ordinarily, the default value of
367+
``use_known_patches`` should be used, but it is present to allow users to
368+
disable this patching in case it causes any problems. It may be removed or
369+
replaced by more fine-grained arguments in future releases.
370+
359371
Using convenience methods
360372
-------------------------
361373
While ``pyfakefs`` can be used just with the standard Python file system
@@ -592,9 +604,9 @@ reasons:
592604
are more examples for patches that may be useful, we may add them in the
593605
documentation.
594606
- It uses C libraries to access the file system. There is no way no make
595-
such a module work with ``pyfakefs`` - if you want to use it, you have to
596-
patch the whole module. In some cases, a library implemented in Python with
597-
a similar interface already exists. An example is ``lxml``,
607+
such a module work with ``pyfakefs``--if you want to use it, you
608+
have to patch the whole module. In some cases, a library implemented in
609+
Python with a similar interface already exists. An example is ``lxml``,
598610
which can be substituted with ``ElementTree`` in most cases for testing.
599611
600612
A list of Python modules that are known to not work correctly with
@@ -606,8 +618,11 @@ A list of Python modules that are known to not work correctly with
606618
- the ``Pillow`` image library does not work with pyfakefs at least if writing
607619
JPEG files (see `this issue <https://github.com/jmcgeheeiv/pyfakefs/issues/529>`__)
608620
- ``pandas`` (the Python data analysis library) uses its own internal file
609-
system access, written in C, and does therefore not work with pyfakefs
610-
(see `this issue <https://github.com/jmcgeheeiv/pyfakefs/issues/528>`__)
621+
system access written in C. Thus much of ``pandas`` will not work with
622+
``pyfakefs``. Having said that, ``pyfakefs`` patches ``pandas`` so that many
623+
of the ``read_xxx`` functions, including ``read_csv`` and ``read_excel``,
624+
as well as some writer functions, do work with the fake file system. If
625+
you use only these functions, ``pyfakefs`` will work with ``pandas``.
611626
612627
If you are not sure if a module can be handled, or how to do it, you can
613628
always write a new issue, of course!

extra_requirements.txt

+7-1
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,11 @@
99
# available at the time of writing.
1010

1111
pathlib2>=2.3.2
12-
1312
scandir>=1.8
13+
14+
# pandas + xlrd are used to test pandas-specific patches to allow
15+
# pyfakefs to work with pandas
16+
# we use the latest version to see any problems with new versions
17+
pandas
18+
xlrd
19+
openpyxl

pyfakefs/fake_filesystem.py

+31-40
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,8 @@
112112
from pyfakefs.helpers import (
113113
FakeStatResult, FileBufferIO, NullFileBufferIO,
114114
is_int_type, is_byte_string, is_unicode_string,
115-
make_string_path, IS_WIN, to_string)
115+
make_string_path, IS_WIN, to_string, matching_string
116+
)
116117
from pyfakefs import __version__ # noqa: F401 for upwards compatibility
117118

118119
__pychecker__ = 'no-reimportself'
@@ -958,24 +959,13 @@ def raise_os_error(self, errno, filename=None, winerror=None):
958959
raise OSError(errno, message, filename, winerror)
959960
raise OSError(errno, message, filename)
960961

961-
@staticmethod
962-
def _matching_string(matched, string):
963-
"""Return the string as byte or unicode depending
964-
on the type of matched, assuming string is an ASCII string.
965-
"""
966-
if string is None:
967-
return string
968-
if isinstance(matched, bytes) and isinstance(string, str):
969-
return string.encode(locale.getpreferredencoding(False))
970-
return string
971-
972962
def _path_separator(self, path):
973963
"""Return the path separator as the same type as path"""
974-
return self._matching_string(path, self.path_separator)
964+
return matching_string(path, self.path_separator)
975965

976966
def _alternative_path_separator(self, path):
977967
"""Return the alternative path separator as the same type as path"""
978-
return self._matching_string(path, self.alternative_path_separator)
968+
return matching_string(path, self.alternative_path_separator)
979969

980970
def _starts_with_sep(self, path):
981971
"""Return True if path starts with a path separator."""
@@ -1035,10 +1025,10 @@ def to_str(string):
10351025
path = self.absnormpath(self._original_path(path))
10361026
if path in self.mount_points:
10371027
return self.mount_points[path]
1038-
mount_path = self._matching_string(path, '')
1028+
mount_path = matching_string(path, '')
10391029
drive = self.splitdrive(path)[:1]
10401030
for root_path in self.mount_points:
1041-
root_path = self._matching_string(path, root_path)
1031+
root_path = matching_string(path, root_path)
10421032
if drive and not root_path.startswith(drive):
10431033
continue
10441034
if path.startswith(root_path) and len(root_path) > len(mount_path):
@@ -1377,8 +1367,8 @@ def normpath(self, path):
13771367
is_absolute_path = path.startswith(sep)
13781368
path_components = path.split(sep)
13791369
collapsed_path_components = []
1380-
dot = self._matching_string(path, '.')
1381-
dotdot = self._matching_string(path, '..')
1370+
dot = matching_string(path, '.')
1371+
dotdot = matching_string(path, '..')
13821372
for component in path_components:
13831373
if (not component) or (component == dot):
13841374
continue
@@ -1453,18 +1443,18 @@ def absnormpath(self, path):
14531443
or the root directory if path is empty.
14541444
"""
14551445
path = self.normcase(path)
1456-
cwd = self._matching_string(path, self.cwd)
1446+
cwd = matching_string(path, self.cwd)
14571447
if not path:
14581448
path = self.path_separator
1459-
if path == self._matching_string(path, '.'):
1449+
if path == matching_string(path, '.'):
14601450
path = cwd
14611451
elif not self._starts_with_root_path(path):
14621452
# Prefix relative paths with cwd, if cwd is not root.
1463-
root_name = self._matching_string(path, self.root.name)
1464-
empty = self._matching_string(path, '')
1453+
root_name = matching_string(path, self.root.name)
1454+
empty = matching_string(path, '')
14651455
path = self._path_separator(path).join(
14661456
(cwd != root_name and cwd or empty, path))
1467-
if path == self._matching_string(path, '.'):
1457+
if path == matching_string(path, '.'):
14681458
path = cwd
14691459
return self.normpath(path)
14701460

@@ -1489,7 +1479,7 @@ def splitpath(self, path):
14891479

14901480
starts_with_drive = self._starts_with_drive_letter(path)
14911481
basename = path_components.pop()
1492-
colon = self._matching_string(path, ':')
1482+
colon = matching_string(path, ':')
14931483
if not path_components:
14941484
if starts_with_drive:
14951485
components = basename.split(colon)
@@ -1545,7 +1535,7 @@ def splitdrive(self, path):
15451535
if sep_index2 == -1:
15461536
sep_index2 = len(path)
15471537
return path[:sep_index2], path[sep_index2:]
1548-
if path[1:2] == self._matching_string(path, ':'):
1538+
if path[1:2] == matching_string(path, ':'):
15491539
return path[:2], path[2:]
15501540
return path[:0], path
15511541

@@ -1579,7 +1569,7 @@ def _join_paths_with_drive_support(self, *all_paths):
15791569
result_path = result_path + sep
15801570
result_path = result_path + path_part
15811571
# add separator between UNC and non-absolute path
1582-
colon = self._matching_string(base_path, ':')
1572+
colon = matching_string(base_path, ':')
15831573
if (result_path and result_path[:1] not in seps and
15841574
result_drive and result_drive[-1:] != colon):
15851575
return result_drive + sep + result_path
@@ -1613,7 +1603,7 @@ def joinpaths(self, *paths):
16131603
joined_path_segments.append(sep)
16141604
if path_segment:
16151605
joined_path_segments.append(path_segment)
1616-
return self._matching_string(paths[0], '').join(joined_path_segments)
1606+
return matching_string(paths[0], '').join(joined_path_segments)
16171607

16181608
def _path_components(self, path):
16191609
"""Breaks the path into a list of component names.
@@ -1664,20 +1654,20 @@ def _starts_with_drive_letter(self, file_path):
16641654
`True` if drive letter support is enabled in the filesystem and
16651655
the path starts with a drive letter.
16661656
"""
1667-
colon = self._matching_string(file_path, ':')
1657+
colon = matching_string(file_path, ':')
16681658
return (self.is_windows_fs and len(file_path) >= 2 and
16691659
file_path[:1].isalpha and (file_path[1:2]) == colon)
16701660

16711661
def _starts_with_root_path(self, file_path):
1672-
root_name = self._matching_string(file_path, self.root.name)
1662+
root_name = matching_string(file_path, self.root.name)
16731663
file_path = self._normalize_path_sep(file_path)
16741664
return (file_path.startswith(root_name) or
16751665
not self.is_case_sensitive and file_path.lower().startswith(
16761666
root_name.lower()) or
16771667
self._starts_with_drive_letter(file_path))
16781668

16791669
def _is_root_path(self, file_path):
1680-
root_name = self._matching_string(file_path, self.root.name)
1670+
root_name = matching_string(file_path, self.root.name)
16811671
return (file_path == root_name or not self.is_case_sensitive and
16821672
file_path.lower() == root_name.lower() or
16831673
2 <= len(file_path) <= 3 and
@@ -1860,7 +1850,7 @@ def _resolve_components(self, path_components, raw_io):
18601850
def _valid_relative_path(self, file_path):
18611851
if self.is_windows_fs:
18621852
return True
1863-
slash_dotdot = self._matching_string(
1853+
slash_dotdot = matching_string(
18641854
file_path, self.path_separator + '..')
18651855
while file_path and slash_dotdot in file_path:
18661856
file_path = file_path[:file_path.rfind(slash_dotdot)]
@@ -2026,7 +2016,7 @@ def lresolve(self, path):
20262016

20272017
# remove trailing separator
20282018
path = self._path_without_trailing_separators(path)
2029-
if path == self._matching_string(path, '.'):
2019+
if path == matching_string(path, '.'):
20302020
path = self.cwd
20312021
path = self._original_path(path)
20322022

@@ -2260,8 +2250,8 @@ def remove_object(self, file_path):
22602250

22612251
def make_string_path(self, path):
22622252
path = make_string_path(path)
2263-
os_sep = self._matching_string(path, os.sep)
2264-
fake_sep = self._matching_string(path, self.path_separator)
2253+
os_sep = matching_string(path, os.sep)
2254+
fake_sep = matching_string(path, self.path_separator)
22652255
return path.replace(os_sep, fake_sep)
22662256

22672257
def create_dir(self, directory_path, perm_bits=PERM_DEF):
@@ -2756,8 +2746,7 @@ def makedir(self, dir_name, mode=PERM_DEF):
27562746
parent_dir, _ = self.splitpath(dir_name)
27572747
if parent_dir:
27582748
base_dir = self.normpath(parent_dir)
2759-
ellipsis = self._matching_string(
2760-
parent_dir, self.path_separator + '..')
2749+
ellipsis = matching_string(parent_dir, self.path_separator + '..')
27612750
if parent_dir.endswith(ellipsis) and not self.is_windows_fs:
27622751
base_dir, dummy_dotdot, _ = parent_dir.partition(ellipsis)
27632752
if not self.exists(base_dir):
@@ -3333,8 +3322,8 @@ def _joinrealpath(self, path, rest, seen):
33333322
encountered in the second path.
33343323
Taken from Python source and adapted.
33353324
"""
3336-
curdir = self.filesystem._matching_string(path, '.')
3337-
pardir = self.filesystem._matching_string(path, '..')
3325+
curdir = matching_string(path, '.')
3326+
pardir = matching_string(path, '..')
33383327

33393328
sep = self.filesystem._path_separator(path)
33403329
if self.isabs(rest):
@@ -3385,8 +3374,10 @@ def expanduser(self, path):
33853374
"""Return the argument with an initial component of ~ or ~user
33863375
replaced by that user's home directory.
33873376
"""
3388-
return self._os_path.expanduser(path).replace(
3389-
self._os_path.sep, self.sep)
3377+
path = self._os_path.expanduser(path)
3378+
return path.replace(
3379+
matching_string(path, self._os_path.sep),
3380+
matching_string(path, self.sep))
33903381

33913382
def ismount(self, path):
33923383
"""Return true if the given path is a mount point.

0 commit comments

Comments
 (0)