How to convert the pdf to images using python

文章目录

In fact,this is not very diffcult problem, because there are many online service(or website) for pdf manipulation, but upload your files to online service/website is dangerous when your files is important. so I search related keywords in search engine and find out the code i need. the folllowing code is written in Python.

Install dependencies

At first, we have to install python dependency/package.

pip install pdf2image

This package need some binaries from the utils poppler, if your operation system is windows, you can download latest pre-compiled zip file from https://github.com/oschwartz10612/poppler-windows/releases/, and unzip the file to certain path like C:\Users\Administor\workspace\poppler-24.02.0, and add the path C:\Users\YE\workspace\poppler-24.02.0\Library\bin to environment PATH.

Convert PDF file to multiple images

You can run the following code after installing the dependencies

from os import path
from pdf2image import convert_from_path

pdf_path = "example.pdf"
filename = ".".join(path.basename(pdf_path).split(".")[:-1])
images = convert_from_path(pdf_path)

for index, image in enumerate(images):
    # of course, you change the extension to other formats, such as png, bmp, etc.
    image.save(f'{filename}-{index}.jpg')

you’ll get a set of images that match the number of pages in the PDF.

Convert the PDF to one image

we can merge all the images into on image.

from os import path

from PIL import Image
from pdf2image import convert_from_path

pdf_path = "example.pdf"
filename = ".".join(path.basename(pdf_path).split(".")[:-1])
images = convert_from_path(pdf_path)
# calculate the total height of all images
total_height = sum([img.height for img in images])
# calculate the maximum width of all images
total_width = max([img.width for img in images])
# create a new Image instance
output_image = Image.new(images[0].mode, (total_width, total_height))

upper = 0
for index, img in enumerate(images):
    # paste the image using the top left point as a reference.
    output_image.paste(img, (index, upper))
    upper += img.height

# save to disk
output_image.save(f"{filename}.png")

参考链接