1
0
Fork 0
mirror of https://github.com/ansible-collections/community.general.git synced 2024-09-14 20:13:21 +02:00

archive: Generate crc32 over 16MiB chunks (#6274)

* archive: Generate crc32 over 16MiB chunks

Running crc32 over the whole content of the compressed file potentially
requires a lot of RAM. The crc32 function in zlib allows for calculating
the checksum in chunks. This changes the code to calculate the checksum
over 16 MiB chunks instead. 16 MiB is the value also used by
shutil.copyfileobj().

* Update changelogs/fragments/6199-archive-generate-checksum-in-chunks.yml

Change the type of change to bugfix

Co-authored-by: Felix Fontein <felix@fontein.de>

* Update changelogs/fragments/6199-archive-generate-checksum-in-chunks.yml

Co-authored-by: Felix Fontein <felix@fontein.de>

---------

Co-authored-by: Felix Fontein <felix@fontein.de>
This commit is contained in:
Nils Meyer 2023-04-13 06:40:16 +02:00 committed by GitHub
parent aa77a88f4b
commit 14b19afc9a
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 9 additions and 1 deletions

View file

@ -0,0 +1,2 @@
bugfixes:
- archive - reduce RAM usage by generating CRC32 checksum over chunks (https://github.com/ansible-collections/community.general/pull/6274).

View file

@ -608,7 +608,13 @@ class TarArchive(Archive):
# The python implementations of gzip, bz2, and lzma do not support restoring compressed files
# to their original names so only file checksum is returned
f = self._open_compressed_file(_to_native_ascii(path), 'r')
checksums = set([(b'', crc32(f.read()))])
checksum = 0
while True:
chunk = f.read(16 * 1024 * 1024)
if not chunk:
break
checksum = crc32(chunk, checksum)
checksums = set([(b'', checksum)])
f.close()
except Exception:
checksums = set()