You are currently viewing How to merge multiple PDFs into one in Python

How to merge multiple PDFs into one in Python

PDF is the most used file format in offices and academia. The feature of avoiding the accidental changes and modifications in the document makes it more useful. Imagine you have hundreds of office documents in pdf format and you want to join or merge these files without using any online services. Here is a solution, you can use python language to merge these files.

PDF is the most used file format in offices and academia. The feature of avoiding the accidental changes and modifications in the document makes it more useful. Imagine you have hundreds of office documents in pdf format and you want to join or merge these files without using any online services. Here is a solution, you can use python language to merge these files.

PyPDF2 is a python library to join multiple PDF documents. This library allows you to merge hundred of pdf documents quickly.

You can use any IDE for python or install Anaconda by following this blog post: The Anaconda is recommended and can be easily installed by following this tutorial: How to Install Python/Anaconda/Jupyter Notebook/Spyder on Ubuntu (Linux)

Open the anaconda prompt and run the below code. It will download and install PyPDF2 library using pip command.

pip install PyPDF2

The following text will be shown on successfully installation.

How to merge multiple PDF into one in python?

Merge multiple pdf files using file names

After the successfully installation, you can perform different tasks using this library. If you have multiple pdf files in a folder and want to merge some of these files. Then the following code use the files name with there extension and store it in a list.

PdfFileMerger function from PyPDF2 library will merge all the pdfs stored in the list and finally write it in a new pdf file.

from PyPDF2 import PdfFileMerger
all_pdfs = ['PDF_File_1.pdf', 'PDF_File_2.pdf']
merge_pdfs = PdfFileMerger()
for pdf in all_pdfs:
    merge_pdfs.append(pdf)
merge_pdfs.write("Merged_file.pdf")
merge_pdfs.close()

Merge all pdf files in a directory

If you want to merge all pdfs available in a directory without mentioning their names. The following code will join all pdfs using python loop functionality and PyPDF2 library.

import os
from PyPDF2 import PdfFileMerger
all_pdfs = [a for a in os.listdir() if a.endswith(".pdf")]
merge_pdfs = PdfFileMerger()
for pdf in all_pdfs:
    merge_pdfs.append(open(all_pdfs, 'rb'))
with open("Merged_file.pdf", "wb") as fout:
    merge_pdfs.write(fout)

Leave a Reply