|
[Sponsors] |
September 26, 2006, 12:02 |
extract bibtex data from pdf/ps files??
|
#1 |
Guest
Posts: n/a
|
Hi,
Very quick but important question. Does anybody know of a unix/linux tool that can extract the meta data from pdf or ps files and write it out in a bibtex format. I have 100's nearly thousands of pdf files and i would like to archive them and have the bibtex info. Any help would be appreciated, I'm looking for either tools that can do the job in one, or a work around, as long as I don't have to type it all out.... Cheers, |
|
September 26, 2006, 12:57 |
Re: extract bibtex data from pdf/ps files??
|
#2 |
Guest
Posts: n/a
|
I don't understand all of your needs, but you can extract a lot of the pdf 'meat' using
www.nuance.com/BusinessPDF. At least that's what their ads say! : o ) |
|
September 26, 2006, 13:02 |
Re: extract bibtex data from pdf/ps files??
|
#3 |
Guest
Posts: n/a
|
Because I'm a poor student, I was kind of after something that I didn't have to pay for. what I'm really after is a method that can take as its input a pdf/ps file and as the output write out the author, year, journal, date, keywords, title etc of the paper into a bibtex format, which I can then copy and past into my ref.bib file.
See easy, that's all I want ;-)) Actually, really useful, not sure if you agree. Any more thoughts, ever heard of this??? Cheers, |
|
September 26, 2006, 13:56 |
Re: extract bibtex data from pdf/ps files??
|
#4 |
Guest
Posts: n/a
|
I am almost certain that this is a near impossible task. What you are asking for is a program that can
a) extract text from a pdf (so far so good), or from a postscript (very difficult, to say the least) b) identify the bibliography section (not hard to do) c) read each bibliography entry and disect it into "author, year, journal..." not knowing the order of items and particular format which may be different in each paper due to different styles (a formidable task!) d) write a bibtex file (easy, once you have finished the impossible It's not a trivial problem, but maybe not as difficult as CFD. It certainly would be useful. If you find such a program or end up writing one yourself, please share! |
|
September 26, 2006, 22:51 |
Re: extract bibtex data from pdf/ps files??
|
#5 |
Guest
Posts: n/a
|
The way i do it is using the electronic god called google.Go to scholar.google.com and and make sure to turn on show link to import citations to bibtex in scholar preferences and type the paper name in search bar.A sign will come below each paper ( import into bibtex ) and if you click on it will take you to a page with citation.Copy paste into bibtex.
|
|
September 26, 2006, 23:48 |
Re: extract bibtex data from pdf/ps files??
|
#6 |
Guest
Posts: n/a
|
Now, *that* is cunning...
diaw... |
|
September 27, 2006, 04:14 |
Re: extract bibtex data from pdf/ps files??
|
#7 |
Guest
Posts: n/a
|
Many, many thanks for all your responses. But!!!!!
I'm not asking for what Mani suggests, I asking for a piece of code that Harish suggests. All I want is the information extracted about the paper I supply as the input, not the referenced papers within the input paper. For example were I to supply Barth's classic linear reconstruction paper from 1989, as the input in pdf format, it would return in bibtex format. @Article{barth1989, author = {T.J. Barth, D.C. Jesperson}, title = {The design and application of upwind schemes on unstructured meshes}, journal = {AIAA}, year = 1989, volume = 89, number = 0366 } I don't want the references within Barths paper, just the bibtex info from it. Please not that this has been thought of in the form on libextractor, can be got from sourceforge, but it will not install without massive system updates to linux, which I'm not going to do as I'm writing up!! So many thanks again, and any further thoughts now I've refined the question would be helpful... Cheers, |
|
September 27, 2006, 07:50 |
Re: extract bibtex data from pdf/ps files??
|
#8 |
Guest
Posts: n/a
|
I see, you have made it a bit easier, but not by much. The difficulty now is to identify title, author, journal, year, and so on from the front page. Title and author should be easy, but the rest is actually not always printed, and if it is, it may be a footnote, a headnote, or at some random place. I am curious if the google method works, though. Have you tried?
To be honest with you, now that you have explained exactly what you want, I am not so sure about the necessity any more. Obviously, it would save you some effort of typing. However, you surely would only reference papers that you have actually read (right?), and how much time does it take to make a bibtex entry for a paper, compared to the time you need to read even the abstract and conclusions? Don't be so lazy!! (just kidding,... let me know if it works) |
|
September 27, 2006, 07:54 |
Re: extract bibtex data from pdf/ps files??
|
#9 |
Guest
Posts: n/a
|
On the case now with google method, it's only failed to get one paper out of about 100 so far, and I'd never use that one anyway. Totally amazing method - the oracle rules again!
Cheers, |
|
September 27, 2006, 08:10 |
Re: extract bibtex data from pdf/ps files??
|
#10 |
Guest
Posts: n/a
|
Scary!
|
|
January 19, 2010, 20:10 |
|
#11 |
New Member
Join Date: Jan 2010
Posts: 1
Rep Power: 0 |
how did you do it?
is it possible for windows? |
|
January 20, 2010, 12:26 |
|
#12 |
Senior Member
Join Date: Apr 2009
Posts: 159
Rep Power: 17 |
Cool Google trick. I use Mendeley free desktop client (www.mendeley.com) to manage my PDF collection (it's like Picassa for PDFs). You can right click and export Bibtex in it ...
|
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
critical error during installation of openfoam | Fabio88 | OpenFOAM Installation | 21 | June 2, 2010 04:01 |
Video from case and data files | girino | FLUENT | 9 | March 29, 2010 03:41 |
Results saving in CFD | hawk | Main CFD Forum | 16 | July 21, 2005 21:51 |
Are Case and Data files enough? | Zhengcai Ye | FLUENT | 6 | January 8, 2004 05:02 |
How to extract streamlines from huge files? | Markus Weber | Main CFD Forum | 3 | August 3, 2000 10:28 |