RSS

JSoup – Java HTML Parser

26 May

JSoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

Here are the simple steps to play around with this api.

#1. Add below dependency entry in the pom.xml

<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.8.2</version>
</dependency>

# 2. Execute the below commands to reflect the changes and ready to use the api.

clean compile install

eclipse:eclipse

#3. Right click on the project — Java Build Path – tick mark the Jsoup jar entry

jsoup- jars

#4. Add the jar in the Deployment Assembly section

jsoup-deployment assembly

 

Click OK button.

#5. Create a Java class and play around with JSoup API. Here is an sample java file.

example - code

Here is an result screen shot for reference.

result

Some of the useful resources here.

JSoup home here.

Download Jar or dependencies reference here.

JSoup cook book here.

JSoup api reference here.

Open Source HTML Parsers in Java

stackoverflow – Which HTML Parser is the best.

Comparison of HTML parsers here

Compare Best HTML Parsers” in Java.

Happy Learning!.

Keep watching this space for more updates

Have a nice day 🙂

Advertisements
 
Leave a comment

Posted by on May 26, 2015 in Uncategorized

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: