How to convert the pdf to images using python
In fact,this is not very diffcult problem, because there are many online service(or website) for pdf manipulation, but upload your files to online service/website is dangerous when your files is important. so I search related keywords in search engine and find out the code i need. the folllowing code is written in Python.
Install dependencies
At first, we have to install python dependency/package.
pip install pdf2image
This package need some binaries from the utils poppler
, if your operation system is windows, you can download latest pre-compiled zip file from https://github.com/oschwartz10612/poppler-windows/releases/, and unzip the file to certain path like C:\Users\Administor\workspace\poppler-24.02.0
, and add the path C:\Users\YE\workspace\poppler-24.02.0\Library\bin
to environment PATH
.
Convert PDF file to multiple images
You can run the following code after installing the dependencies
from os import path
from pdf2image import convert_from_path
pdf_path = "example.pdf"
filename = ".".join(path.basename(pdf_path).split(".")[:-1])
images = convert_from_path(pdf_path)
for index, image in enumerate(images):
# of course, you change the extension to other formats, such as png, bmp, etc.
image.save(f'{filename}-{index}.jpg')
you’ll get a set of images that match the number of pages in the PDF.
Convert the PDF to one image
we can merge all the images into on image.
from os import path
from PIL import Image
from pdf2image import convert_from_path
pdf_path = "example.pdf"
filename = ".".join(path.basename(pdf_path).split(".")[:-1])
images = convert_from_path(pdf_path)
# calculate the total height of all images
total_height = sum([img.height for img in images])
# calculate the maximum width of all images
total_width = max([img.width for img in images])
# create a new Image instance
output_image = Image.new(images[0].mode, (total_width, total_height))
upper = 0
for index, img in enumerate(images):
# paste the image using the top left point as a reference.
output_image.paste(img, (index, upper))
upper += img.height
# save to disk
output_image.save(f"{filename}.png")