Replacing image in a PDF with Python

2019-07-30T18:27:09+05:30

Hello,
Very good work !
I’m trying to replicate your example but I get a corrupted PDF.
How should the image format be?
Thank you and congratulations for your code.

LikeLike

Reply

2019-07-31T06:54:01+05:30

Thank you. Having the image format in the same aspect ratio and format (png/jpeg) is ideal.

LikeLike

Reply

2023-01-13T16:59:38+05:30

Hi, I want to replace in a pdf the same image type (same N images in the file) in another. It is a simple replace for a LOGO in a pdf? How python.py change? Thanks

LikeLike

2019-11-04T01:53:55+05:30

I tried to run your code but it gives me:
> TypeError: a bytes-like object is required, not ‘str’
which is referring to:
> f.write(contents[:start])

LikeLike

Reply

2019-11-08T16:49:01+05:30

I don’t remember if I wrote the script in Python 2 or 3. But from the error you posted, I think, I might have used Python 2, so when I open the files in “r” mode they were reading bytes and that you are using Python 3, so when you open your file, it is read as a string. Try specifying “rb” as the open method instead of “r” and see if that solves your problem.

LikeLike

Reply

2020-07-25T19:09:13+05:30

I tried with your code. However getting the same error.
TypeError: a bytes-like object is required, not ‘str’
I am using python3.6 and while opening the uncompressed.pdf file I am using “rb” mode.

LikeLike

Reply

2020-08-30T03:45:52+05:30

This is such a useful project! Thanks, Arunmozhi!
I’m having similar issues with encodings. Would you mind emailing me soI can share my code with you?
Thanks!

LikeLike

Reply

2020-08-30T06:09:36+05:30

I no longer maintain this code. Kindly use this as a starter and build your own solution.

LikeLike

Reply

2020-10-21T15:36:07+05:30

Hai sir..is there possible to replace text in place of image?

LikeLike

Reply

2021-01-26T22:04:05+05:30

I am not sure actually.

LikeLike

Reply

2022-11-14T16:52:54+05:30

I tried to adapt this code to a working version in the current version of Python (3.10.6)

You can find the code on GitHub:

	import sys
	import os
	from PIL import Image

	# Include the \n to ensure extact match and avoid partials from 111, 211…
	OBJECT_ID = "\n11 0 obj"


	def replace_image(filepath, new_image):
	f = open(filepath, "rb")
	contents = f.read()
	f.close()

	image = Image.open(new_image)
	width, height = image.size
	length = os.path.getsize(new_image)

	start = contents.find(str.encode(OBJECT_ID))
	stream = contents.find(str.encode("stream"), start)
	image_beginning = stream + 7

	# Process the metadata and update with new image's details
	meta = contents[start: image_beginning]
	meta = meta.split(str.encode("\n"))
	new_meta = []
	for item in meta:
	if str.encode("/Width") in item:
	new_meta.append("/Width {0}".format(width))
	elif str.encode("/Height") in item:
	new_meta.append("/Height {0}".format(height))
	elif str.encode("/Length") in item:
	new_meta.append("/Length {0}".format(length))
	else:
	new_meta.append(item.decode(encoding='utf-8'))
	new_meta = "\n".join(new_meta)
	# Find the end location
	image_end = contents.find(str.encode("endstream"), stream) – 1

	# read the image
	f = open(new_image, "rb")
	new_image_data = f.read()
	f.close()

	# recreate the PDF file with the new_sign
	with open(filepath, "wb") as f:
	f.write(contents[:start])
	f.write(str.encode("\n"))
	f.write(str.encode(new_meta))
	f.write(new_image_data)
	f.write(contents[image_end:])


	#replace_image('pdfuncompressedfile.pdf' 'new_image')

	if __name__ == "__main__":
	if len(sys.argv) == 3:
	replace_image(sys.argv[1], sys.argv[2])
	else:
	print("Usage: python process.py <pdfuncompressedfile> <new_image>")

view raw

main.py

hosted with ❤ by GitHub

Nice work Arunmozhi!

LikeLike

Reply

2023-01-13T17:31:22+05:30

Hi,
every file use for the example the error is the same:

File “….\process.py”, line 38
image_end = contents.find(“endstream”, stream) – 1
^
SyntaxError: invalid character ‘–’ (U+2013)

How can resolve it?

LikeLike

Reply

2023-01-14T17:35:34+05:30

Hi, I wrote this code 4 years back and don’t fully remember enough to debug this via comments. I would suggest, you use the article and the code as a guide rewrite it to suit your needs. I am sorry, I can’t more help than this.

LikeLike

Reply

2024-01-19T06:40:30+05:30

what looks like minus in code above is actually an en dash character. replace with minus sign.

LikeLike

Reply

2023-01-14T19:53:46+05:30

Hey, try to use the most recently created script, adapted from old Arunmozhi code, you can find it 3 comments above. Hope this helps.

LikeLike

Reply

Replacing image in a PDF with Python

Step 1 – Understanding the format

Step 2 – Uncompressing the PDF and extracting the images

Step 3 – Identifying the image to replace

Step 4 – Identifying the object in PDF that represents the image

Step 5 – Replacing the image with another image

Step 6 – Compressing the file back (OPTIONAL)

Author: Arunmozhi

15 thoughts on “Replacing image in a PDF with Python”

Leave a reply to Arunmozhi Cancel reply