You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@dobbersc pointed out that there are some obvious errors in the current test cases, especially when looking at the plaintext #338
This issue is a checklist to go through all test cases and write down anything weird that's present in the test cases.
Update: 22.02
I went through every JSON. Now, the checkbox is indicating if the parser still needs a fix.
at
ORF
de
Die Welt:
Mitteldeutscher Rundfunk (MDR): Image captions included in the plain text; Author byline at the endUpdated MDR Parser #370
Berliner Zeitung: Duplicate nodes for summary/paragraphs, headline/paragraphs, there was a copy past error with 4d29dcc on the summary selector, this div[data-testid=article-header] > p is the correct oneUpdate Berliner Zeitung - paragraph selector #372
@dobbersc pointed out that there are some obvious errors in the current test cases, especially when looking at the plaintext #338
This issue is a checklist to go through all test cases and write down anything weird that's present in the test cases.
Update: 22.02
I went through every JSON. Now, the checkbox is indicating if the parser still needs a fix.
at
de
Image captions included in the plain text; Author byline at the end
Updated MDR Parser #370JS included in the plain text
Add functionality to exclude tags from extraction and normalize space #382author byline at end of article with dpa
Fiz zeit paragraph selector #371Duplicate nodes for summary/paragraphs, headline/paragraphs, there was a copy past error with 4d29dcc on the summary selector, this div[data-testid=article-header] > p is the correct one
Update Berliner Zeitung - paragraph selector #372author included in article text
Update NDR paragraph selector #373couldn't load entire text because of privacy restrictions
BusinessInsider: multiple Updates #376fr
Selectors for summary and subheadlines are missing although given in the article.
LeMonde: add summary and subheadline selector #374na
summary is missing.
The Namibian Parser Update #363uk
Bad test case
TheGuardian: Update test case #375subheadlines missing
TheTelegraph: add subheadline selector #377change test case with https://inews.co.uk/culture/television/women-took-over-tv-detective-drama-2846727
us
related content included in the plaintext
Adjust paragraph selector for CNBC #366Image captions included; related content
Adjust paragraph selector for gateway pundit #367related content included
Adjust paragraph selector for Fox News parser #368JS included in the V1 test case, V2 bot fully extracted, only the first 6 out of 15 paragraphs
Fix malformed HTML forTheNation
#385World TruthThere is a summary that could be parsed from the HTML
Add_summary_selector
toFreeBeacon
#380summary selector not working
Fix_summary_selector
forTheNewYorker
#379first paragraph seems to be a summary
Fix sitemap filters forOccupyDemocrats
#381subheadlines missing h3[class*=story-title]
Addsubheadline
selector toLATimes
#378The text was updated successfully, but these errors were encountered: