JANE API
I’m a fan of using JANE (Journal Author Name Estimator) for suggestions on where to send papers and even who might make good reviewers. I wanted to incorporate some of JANE’s functionality into a web app I’m building to help inexperienced authors with paper submissions.
However, I had trouble today finding working examples of how to make API calls from JANE and then parse them. After much trial and error, I found out Pandas in Python can do some parsing of HTML tables (it seems using BeautifulSoup under the hood.)
So, here’s how to submit an abstract to JANE and parse the output in Python.
First, prepare your text using urllib.
import urllib text = """ This is some text. It can be a paragraph -- or many. """ encoded_query_text = urllib.parse.quote(text.rstrip().lstrip())
Then, use pandas to request and parse the results. You can see the other possible commands you can insert in the URL string where I have ‘findJournals’ here (pdf link). Note that the SOAP webservice doesn’t seem to be working… at least not for me right now. So requests have to be in URLs.
import pandas as pd dfs = pd.read_html("https://jane.biosemantics.org/suggestions.php?findJournals&structured&text="+encoded_query_text) journals = dfs[0].dropna(how='all')[1]
Pandas had some all NA rows for me. So I got rid of those. Then, I was interested in just getting the journal names. The resulting DataFrame combined the journal names and some of the tags JANE assigns. So, I wrote a loop to parse all that out.
journal_dict = [] for journal in journals: journal_cleaned = journal.replace(u'\xa0', u' ') journal_dict.append( { 'journal': re.sub('PMC|Medline-indexed|High-quality open access', "",journal_cleaned), 'Medline-indexed': ('Medline-indexed' in journal_cleaned), 'High-quality open access': ('High-quality open access' in journal_cleaned), 'PMC': ('PMC' in journal_cleaned) } ) journal_df = pd.DataFrame(journal_dict)