Wikipedia pages have the main content of the page wrapped by an HTML comment like this:
<!-- bodytext --> <!-- /bodytext -->
Here is the java code to get the contents of that area:
// Get the body
Pattern rxBodyText = Pattern.compile("<!-- bodytext -->(.+)<!-- /bodytext -->", Pattern.DOTALL);
Matcher m = rxBodyText.matcher(sResult);
if (m.find())
{
sResult = m.group(1);
}
Add Comment