Crawl-On-Yarn Application

Crawls html pages from user.profile.tags.us.txt and calculates top 10 words with highest frequency.

To setup and execute application follow next steps:

Create yarn user in hdfs:

#!bash

su hdfs -c "hdfs dfs -mkdir /user/yarn"
su hdfs -c "hdfs dfs -chown yarn /user/yarn"

Go to application home directory and execute:

#!bash
mvn clean package

Copy ${application_home}/target/crawl-on-yarn-0.0.1.jar, ${application_home}/log4j.properties, ${application_home}/user.profile.tags.us.txt to /opt directory:

#!bash
cp ${application_home}/target/crawl-on-yarn-0.0.1.jar /opt
cp ${application_home}/log4j.properties /opt
cp ${application_home}/user.profile.tags.us.txt /opt

Run the application:

#!bash
su yarn -c "java -cp $(hadoop classpath):/etc/hadoop/*:/opt/crawl-on-yarn-0.0.1.jar home.nkavtur.client.CrawlAppClient -log_properties /opt/log4j.properties -user_profile_tags /opt/user.profile.tags.us.txt"

After the application completes, copy resulting file from hdfs to localFS to see the result:

#!bash
su yarn -c "hdfs dfs -copyToLocal Crawl-On-Yarn-App/{appId}/res.csv /home/yarn"

Logs are available in here:

/var/log/hadoop/yarn/CrawlApplicationMaster.stderr
/var/log/hadoop/yarn/CrawlApplicationMaster.stderr

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src/main		src/main
README.md		README.md
dependency-reduced-pom.xml		dependency-reduced-pom.xml
log4j.properties		log4j.properties
pom.xml		pom.xml
user.profile.tags.us.txt		user.profile.tags.us.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawl-On-Yarn Application

Crawls html pages from user.profile.tags.us.txt and calculates top 10 words with highest frequency.

To setup and execute application follow next steps:

Logs are available in here:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Crawl-On-Yarn Application

Crawls html pages from user.profile.tags.us.txt and calculates top 10 words with highest frequency.

To setup and execute application follow next steps:

Logs are available in here:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages