Creating a new plugin
To extend the functionality of doi_downloader
, you can create a new plugin. A plugin is a Python module that implements the Plugin
interface. Below is a step-by-step guide to creating a new plugin.
Step 1: Create a new Python file
Create a new Python file in the plugins
or extra_plugins
directory. extra_plugins
folder if plugins that you do not want to commit to GitHub since the folder is ignored. The name of the file should be descriptive of the plugin's functionality, for example, my_plugin.py
.
Step 2: Implement the Plugin interface
In your new Python file, you need to implement the Plugin
interface. This involves creating a class that inherits from Plugin
and implementing the required methods. Here is an example:
import requests
from doi_downloader.plugins import Plugin
from doi_downloader import article_dataobject as ado # import ArticleDataObject
# Read API keys and other sensitive data from environment variables
MY_API_URL = "https://example.com/{doi}"
class MyPlugin(Plugin):
def __new__(self):
instance = super(Plugin, self).__new__(self)
return instance
def test(self):
return True
def fetch_metadata(self, doi):
url = MY_API_URL.format(doi=doi)
try:
if response.status_code == 200:
paper = response.json()
title = paper.get("title", "N/A")
download_link = paper.get("downloadUrl", "N/A")
data_object = ado.ArticleDataObject(None)
data_object.set_title(title)
data_object.set_doi(doi)
if download_link:
data_object.add_pdf_link(download_link)
return data_object
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
return None
def get_pdf_url(self, doi, use_cache=True, ttl=0):
metadata = self.fetch_metadata(doi)
return metadata.get_pdf_url() if metadata else None
The plugin needs to implement two functions: fetch_metadata
and get_pdf_url
. The fetch_metadata
function should return an ArticleDataObject
containing the metadata of the article, while the get_pdf_url
function should return the URL of the PDF file.
fetch_metadata
should handle the API request and parse the response to extract the necessary information.
Step 3: Loading and testing the plugin
The doi_downloader
loader module will automtically load all plugin files in the plugins
or extra_plugins
directory. You can test your plugin by loading your plugin withthis script:
from doi_downloader import loader as ld
plugins = ld.plugins
my_plugin = plugins["MyPlugin"]
doi = "10.1000/xyz123" # Replace with a valid DOI
metadata = my_plugin.fetch_metadata(doi)
pdf_url = my_plugin.get_pdf_url(doi)
Step 4: Caching results
It is advantageous to cache the results of the plugin to avoid making repeated API calls for the same DOI. This feature needs to be implemented as part of the plugin.
doi_downloader
implements a cache object that can be used to store the results of the plugin. The following example shows how to make use of it:
from doi_downloader.plugins import Plugin
# Load the cache object
from doi_downloader.cache_duckdb import Cache
from doi_downloader import article_dataobject as ado
class MyPlugin(Plugin):
def __new__(self):
instance = super(Plugin, self).__new__(self)
# Initialize the cache object with a database file and plugin name
# A table with the plugin name will be created in the database so that
# all plugins can share the same database file.
self.cache = Cache("database.db", "myplugin")
return instance
def test(self):
return True
def fetch_metadata(self, doi):
# retrieve metadata from API
return None
# The get_pdf_url method uses the cache to store and retrieve the PDF URL
# for a given DOI. If the URL is found in the cache, it is returned.
# ttl (time to live) can be set to control how recent the cached object should be.
def get_pdf_url(self, doi, use_cache=True, ttl=0):
if use_cache:
# Check the cache first
cached_data = self.cache.get_cache(doi, ttl=ttl)
# If cached data is found, return the PDF link from the cached data
if cached_data:
data_object = ado.ArticleDataObject.from_json(cached_data)
data_object.validate()
return data_object.get_pdf_link()
# If not found in cache, fetch metadata from the API
metadata = self.fetch_metadata(doi)
if metadata:
# If retrieved metadata is valid, store it in the cache.
if use_cache:
self.cache.set_cache(doi, metadata.to_json())
return metadata.get_pdf_link()
else:
return None