Joachim Breitner

Generating bibtex bibliographies from DOIs via DBLP

Published 2023-07-12 in sections English, Digital World.

I sometimes write papers and part of paper writing is assembling the bibliography. In my case, this is done using BibTeX. So when I need to add another citation, I have to find suitable data in Bibtex format.

Often I copy snippets from .bib files from earlier paper.

Or I search for the paper on DBLP, which in my experience has highest quality BibTeX entries and best coverage of computer science related publications, copy it to my .bib file, and change the key to whatever I want to refer the paper by.

But in the days of pervasive use of DOIs (digital object identifiers) for almost all publications, manually keeping the data in bibtex files seems outdated. Instead I’d rather just put the two pieces of data I care about: the key that I want to use for citation, and the doi. The rest I do not want to be bothered with.

So I wrote a small script that takes a .yaml file like

entries:
  unsafePerformIO: 10.1007/10722298_3
  dejafu: 10.1145/2804302.2804306
  runST: 10.1145/3158152
  quickcheck: 10.1145/351240.351266
  optimiser: 10.1016/S0167-6423(97)00029-4
  sabry: 10.1017/s0956796897002943
  concurrent: 10.1145/237721.237794
  launchbury: 10.1145/158511.158618
  datafun: 10.1145/2951913.2951948
  observable-sharing: 10.1007/3-540-46674-6_7
  kildall-73: 10.1145/512927.512945
  kam-ullman-76: 10.1145/321921.321938
  spygame: 10.1145/3371101
  cocaml: 10.3233/FI-2017-1473
  secrets: 10.1017/S0956796802004331
  modular: 10.1017/S0956796817000016
  longley: 10.1145/317636.317775
  nievergelt: 10.1145/800152.804906
  runST2: 10.1145/3527326
  polakow: 10.1145/2804302.2804309
  lvars: 10.1145/2502323.2502326
  typesafe-sharing: 10.1145/1596638.1596653
  pure-functional: 10.1007/978-3-642-14162-1_17
  clairvoyant: 10.1145/3341718
subs:
  - replace: Peyton Jones
    with: '{Peyton Jones}'

and turns it into a nice .bibtex file:

$ ./doi2bib.py < doibib.yaml > dblp.bib
$ head dblp.bib
@inproceedings{unsafePerformIO,
  author       = {Simon L. {Peyton Jones} and
                  Simon Marlow and
                  Conal Elliott},
  editor       = {Pieter W. M. Koopman and
                  Chris Clack},
  title        = {Stretching the Storage Manager: Weak Pointers and Stable Names in
                  Haskell},
  booktitle    = {Implementation of Functional Languages, 11th International Workshop,
                  IFL'99, Lochem, The Netherlands, September 7-10, 1999, Selected Papers},

The last bit allows me to do some fine-tuning of the file, because unfortunately, not even DBLP BibTeX files are perfect, for example in the presence of two family names.

Now I have less moving parts to worry about, and a more consistent bibliography.

The script is rather small, so I’ll just share it here:

#!/usr/bin/env python3

import sys
import yaml
import requests
import requests_cache
import re

requests_cache.install_cache(backend='sqlite')

data = yaml.safe_load(sys.stdin)

for key, doi in data['entries'].items():
    bib = requests.get(f"https://dblp.org/doi/{doi}.bib").text
    bib = re.sub('{DBLP.*,', '{' + key + ',', bib)
    for subs in data['subs']:
        bib = re.sub(subs['replace'], subs['with'], bib)
    print(bib)

There are similar projects out there, e.g. dblpbibtex in C++ and dblpbib in Ruby. These allow direct use of \cite{DBLP:rec/conf/isit/BreitnerS20} in Latex, which is also nice, but for now I like to choose more speaking citation keys myself.

Comments

Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.