Creating a new plugin
To extend the functionality of doi_downloader, you can create a new plugin for retrieving PDFs related to DOI from
a specific website. A plugin is a Python module that implements the Plugin interface. Below is a step-by-step guide
to creating a new plugin.
Step 1: Create a new Python file
Create a new Python file in the doi_downloader/plugins directory or (for testing only) in the directory
doi_downloader/extra_plugins. The name of the file should be descriptive of the plugin's functionality,
for example, my_plugin.py. Plugins stored in the extra_plugins directory, will not be stored on Github.
Step 2: Implement the Plugin interface
In your new Python file, you need to implement the Plugin interface. This involves creating a class that inherits
from Plugin and implementing the required methods. Here is an example:
import requests
from doi_downloader.plugins import Plugin
from doi_downloader import article_dataobject as ado
MY_API_URL = "https://example.com/{doi}"
class MyPlugin(Plugin):
def __new__(self):
instance = super(Plugin, self).__new__(self)
return instance
def test(self):
return True
def fetch_metadata(self, doi):
url = MY_API_URL.format(doi=doi)
try:
response = requests.get(url, headers=headers, params=params)
if response.status_code == 200:
paper = response.json()
title = paper.get("title", "N/A")
download_link = paper.get("downloadUrl", "N/A")
data_object = ado.ArticleDataObject(None)
data_object.set_title(title)
data_object.set_doi(doi)
if download_link:
data_object.add_pdf_link(download_link)
return data_object
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
return None
def get_pdf_url(self, doi, use_cache=True, ttl=0):
metadata = self.fetch_metadata(doi)
return metadata.get_pdf_url() if metadata else None
The plugin needs to implement two functions: fetch_metadata and get_pdf_url. The fetch_metadata function should return an ArticleDataObject containing the metadata of the article, while the get_pdf_url function should return the URL of the PDF file.
fetch_metadata should handle the API request and parse the response to extract the necessary information.
Step 3: Loading and testing the plugin
The doi_downloader loader module will automtically load all plugin files in the plugins or extra_plugins directory. You can test your plugin by loading your plugin withthis script:
from doi_downloader import loader as ld
doi = "10.1000/xyz123"
metadata = ld.plugins["MyPlugin"].fetch_metadata(doi)
pdf_url = ld.plugins["MyPlugin"].get_pdf_url(doi)
Step 4: Caching API results
It is advantageous to cache the results of the plugin to avoid making repeated API calls for the same DOI. This feature needs to be implemented as part of the plugin.
doi_downloader implements a cache object that can be used to store the results of the plugin. The following example shows how to make use of it:
from doi_downloader.plugins import Plugin
from doi_downloader.cache_duckdb import Cache
from doi_downloader import article_dataobject as ado
class MyPlugin(Plugin):
def __new__(self):
instance = super(Plugin, self).__new__(self)
self.cache = Cache("database.db", "myplugin")
return instance
def test(self):
return True
def fetch_metadata(self, doi):
return None
def get_pdf_url(self, doi, use_cache=True, ttl=0):
if use_cache:
cached_data = self.cache.get_cache(doi, ttl=ttl)
if cached_data:
data_object = ado.ArticleDataObject.from_json(cached_data)
data_object.validate()
return data_object.get_pdf_link()
metadata = self.fetch_metadata(doi)
if metadata:
if use_cache:
self.cache.set_cache(doi, metadata.to_json())
return metadata.get_pdf_link()
else:
return None
The available plugins in the doi_downloader/plugins directory can be inspected for more example plugins code.