Student Thesis Project: Author extraction

TL.DR Build the next generation Open source Python NewsPaper framework https://newspaper.readthedocs.io

@BibHack
Ξ 1.414
0.0 (0)
No Disputes
Get this gig

Delivery time: 60 days

Gigs Completed: 0

Buy gig (Ξ 1.414)
Gig details

I am looking for a Student who needs a thesis project for their University and wants to do this with an outside company.

TL.DR Build the next generation Open source Python NewsPaper framework https://newspaper.readthedocs.io/en/latest/

I have a commercial interest in this technology moving forward, the current Python author detection libraries are bad, getting the author wrong 70% of the time. That is why I am funding an open source project.

Key features:
1. Extract author name
2. Detect is author name present
3. Extract bounds of the article (aka get the text of the article without any of the comments or sidebars around it )

Some Related Publications on author extraction :
https://moz.com/devblog/web-page-author-extraction/
https://www.microsoft.com/en-us/research/wp-content/uploads/2012/10/2012-cikm-2387.pdf
https://www.researchonline.mq.edu.au/vital/access/services/Download/mq:3573/DS01
https://pdfs.semanticscholar.org/65ce/abcbf11e2680ed984ac86d4539aeb65f98a3.pdf

We will test the quality of your new library by comparing against hand labeled correct grown truth data and comparing to the leading paid API : https://www.diffbot.com/dev/docs/article/

Secondary Optional Features :
4. Detect article publication time stamp.
5. Detect Title of article
6. Detect language
7. Get Keywords
8. Summarize article
9. Get the list of outbound links inside the article text

Ratings

Nothing to show
lvl 01
From
United Kingdom
Member since November 2018
Last seen: 11 months, 1 week ago
XP 0