Create an RSS feed for a web page which does not offer its own.
Submitted parameters ( [url] => www.sankei.com/premium/topics/premium-26405-t1.html [url_contains] => /news/ )
When making HTTP requests, you can pass the following parameters to extract.php (either in a GET request or POST request). For most uses, supplying the page URL (url) and a selector (either in_id_or_class or item) is enough.
We do not provide form fields for all of these parameters, but you can modify the URL in your browser after clicking 'Preview' to use them. See our examples.
|url||string (URL)||Web page URL containing links which we want to extract for our RSS feed. This is required.|
|in_id_or_class||string (attribute value)||Find links inside elements whose id or class attribute matches this value.
This translates to the following XPath:
//a[@href and ancestor-or-self::*[@id="string" or contains(concat(" ",normalize-space(@class)," "), " string ")]]
|item||string (CSS selector)||Look inside element(s) matching this CSS expression (for example: div.news .item). Cannot be used in combination with in_id_or_class.|
|item_title||string (CSS selector) or 0||Extract item title from element matching CSS selector. This is applied within the context of elements selected by item. If omitted, the text of the first matching <a> element will be used. If set to 0, titles will not be included in the output.|
|item_url||string (CSS selector), 0 or 1||Extract item URL from element matching CSS selector. This is applied within the context of elements selected by item. If omitted, the URL of the first matching <a> element will be used. If set to 0, URLs will not be included in the output. If set to 1, all item URLs will point to the input URL.|
|item_desc||string (CSS selector)||Extract item description from element matching CSS selector. This is applied within the context of elements selected by item. If omitted, the generated feed will not include item descriptions.|
|item_date||string (CSS selector)||Extract item date from element matching CSS selector. This is applied within the context of elements selected by item. If omitted, the generated feed will not include item date.|
|url_contains||string (URL substring)||Filter out any item whose URL does not contain one of these substrings. This can appear multiple times. Example: /article/|
|text_contains||string (word or phrase)||Filter out any item whose title or description does not contain any of the supplied words or phrases. This can appear multiple times.|
|unique_url||1 (default), 0||If multiple matching items have the same URL, only the first encountered will be kept. Set to 0 to keep all matching URLs. Note: this is enabled by default, but always disabled when you ask for URLs to be excluded from output (item_url=0).|
|unique_title||1 (default), 0||If multiple matching items have the same title, only the first encountered will be kept. Set to 0 to keep all matching titles. Note: this is enabled by default, but always disabled when you ask for titles to be excluded from output (item_title=0)|
|strip||string (CSS selector)||Remove elements matching CSS selector. This will be processed before we start looking for items. To strip multiple elements, use commas. For example: .header,.footer|
|strip_if_url||string||Filter out any item whose URL contains any of the supplied words or phrases. This can appear multiple times.|
|strip_if_text||string||Filter out any item whose title or description contain any of the supplied words or phrases. This can appear multiple times.|
|feed_title||string||The feed title to use in the generated feed. If omitted, we'll use whatever's in the <title> element of the web page requested. Note: this should be the actual title, not a selector.|
|max||number (limit: 10)||The maximum number of items to return. In the self-hosted version the limit can be changed in the config file.|
|format||rss (default), json||By default the output will be formatted as RSS 2.0. Set this to JSON if you prefer that.|
|order||document (default), reverse||Results are returned in document order by default (ie. in the order they appear in the source HTML). Set this to 'reverse' to reverse it.|
|parser||libxml (default), html5||Libxml is a fast parser, but might not handle HTML5 properly. If your items are not being extracted, it might be because of the way the page has been parsed. Try changing the parser to HTML5 to see if that improves things.|
|saved||string||Saved request parameters. In the config file, you can associate a set of request parameters with a given name. You can then pass that name in the request instead of the request parameters. This is useful if you want to subscribe to a generated feed and then update it later without changing the feed URL.|
Required parameters: url must be supplied.
Our hosted service (the one accessible via the form at the top of this page) is free to use. It is intended for light, personal use and to demo what our self-hosted package can do. We cache webpages for around 30 minutes and limit results to 10 items. We do not currently offer a premium plan, so developers or others who need to make a lot of requests, please purchase our self-hosted package.
You don't have to rely on our hosted service. You can buy the application and host it yourself. Here's what you get:
Please download and run our simple compatibility test before purchasing. It's a single (zipped) PHP file you can upload to your server and access through your browser. If you need help hosting it, we have a few hosting suggestions.
Feed Creator 1.3 — Changelog
Free updates for 1 yearBuy Now — 30 €
Feed Creator plus:
Free updates forever!Buy Bundle — 60 €