Whispers & Screams
And Other Things

The Latest Referrer Spam - Semalt and Buttons For Website

So, you manage some websites, you're a fan of Google analytics or even just use a local server log analyser to view your site stats. If this is you then you cant fail to have noticed that your sites have been getting visits lately from referrer bots called semalt.com and buttons-for-website.com. There are a couple of good reasons why you shouldn't ignore this traffic. In fact you should block it from your site and if you're using an Apache web server, which most people are these days, then I'll show you how to do it for yourself.

The Semalt and Buttons For Website bots dont seem to be harmful to websites per-se however their effect on SEO should not be ignored. If your website is getting 50 or 100 hits per month from these things it will affect your overall clocked bounce rate since these bots is always bounce. This will make it seem as though visitors to your site are not finding the material they were looking for and, to the search engines, may decrease the perceived quality of your site and thereby effect your ranking.

It should be noted that Semalt is not your typical bot. Analysis shows that the company uses a QtWebKit browser engine to avoid detection. Consequently, Semalt bots can execute JavaScript and hold cookies, thereby enabling them to avoid common bot filtering methods (e.g., asking a bot to parse JavaScript). Because of their ability to execute JavaScript, these bots also appears in Google Analytics reports as being “human” traffic.

Recently, substantial evidence revealed that Semalt isn’t running a regular crawler. Instead, to generate bot traffic, the company appears to be using a botnet that is spread around by a malware, hidden in a utility called Soundfrost.

“Botnets sometimes compromise computers whose security defenses have been breached and control conceded to a third party. Each such compromised device, known as a “bot”, is created when a computer is penetrated by software from amalware (malicious software) distribution. The controller of a botnet is able to direct the activities of these compromised computers” – Wikipedia

Their Botnet involves hundreds or thousands of computers and too many IP addresses to be able to effectively bloc the crawler via IP Exclusion in Analytics. To see a list of IP addresses associated with Semalt go to this page. It will return a long list of (at least hundreds) of IP addresses associated with Semalt.

Blocking these sites like you would other crawlers/spiders in your robots.txt file may not be effective either since compliance with directives in the robots.txt file is voluntary and those who are running something Black Hat certainly do not care about complying with the wishes of others.

Buttons For Website seems to be very similar in function (alleged to be a spambot/botnet) except that it uses a different delivery method. In this case the Buttons For Website site simply offers a handy sharing tool for you to install on your website. However, by installing the supplied code, you are potentially creating a way for a person to hijack (zombify) the web browser of visitors to your site.

According to one article I found javascript hijacking can also be used for nefarious purposes. Even though the article is about using javascript to create a botnet through online ads the same principle should work just as well with a permanent installation like sharing buttons.

“Adding arbitrary JavaScript to ads is easy to do and in the experience of the researchers wasn’t checked very closely by the ad network. To make it more convenient to change the malicious script, rather than placing the script itself in the ad, they put in the script source.” – NetworkWorld

Semalt And Buttons For Website Blocking


Since potentially both Semalt and Buttons For Website traffic is going to be coming from a large number of IP addresses (Semalt from infected computers and Buttons For Website from visitors to infected sites) the option of blocking this traffic by IP exclusion in Analytics would not be effective. An alternative, which is what I have used successfully on all of the WordPRess sites that I manage, is to block traffic from semalt.semalt.com and buttons-for-website.com in the .htacces file of each site.

To do this you have to have access to the files in the root directory on your web host that make up your WordPress, Joomla or Drupal site and be using an Apache system (most hosting providers do). If you have never worked with the files in the root directory of your site and/or are not familiar with editing the .htaccess file ask your webmaster to do it for you. If you make a mistake when editing your .htaccess file, the result can make the site completely unavailable.

If you are comfortable with editing your .htaccess file then adding the following code to it should block both Semalt and Buttons For Website traffic to your site.

# block visitors referred from semalt.com
RewriteEngine on
RewriteCond %{HTTP_REFERER} semalt\.com [NC]
RewriteRule .* – [F]
# End semalt block
# block referer spam buttons for website
RewriteEngine On
RewriteCond %{HTTP_REFERER} buttons\-for\-website\.com
RewriteRule ^.* - [F,L]
# End buttons for website block


At Rustyice Solutions we use this method to block Semalt and Buttons For Website traffic on many WordPress, Joomla and Drupal sites that we manage and so far it has resulted in the total elimination of all traffic from these two sites from all of the managed websites. If you do not have a webmaster and are seeing traffic from these sources to your WordPress website we will be happy to help you with the problem. Contact me using the contact form on this site (Click Here) and I will be happy to help for a very small fee.

Continue reading
1469 Hits
0 Comments

Too Much Information - Hadoop and Big Data

hHadoop, a free, Java-based programming framework that makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes, supports the processing of large amounts of data in a distributed computing environment and is part of the Apache project sponsored by the Apache Software Foundation. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.

Hadoop was inspired by Google's MapReduce, a software framework in which an application is broken down into numerous small parts. Any of these parts (also called fragments or blocks) can be run on any node in the cluster. Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant. The current Apache Hadoop ecosystem consists of the Hadoop kernel, MapReduce, the Hadoop distributed file system (HDFS) and a number of related projects such as Apache Hive, HBase and Zookeeper.

The Hadoop framework is used by major players including Google, Yahoo and IBM, largely for applications involving search engines and advertising. The preferred operating systems are Windows and Linux but Hadoop can also work with BSD and OS X.

The rapid proliferation of unstructured data is one of the driving forces of the new paradigm of big data analytics. According to one study, we are now producing as much data every 10 minutes as was created from the beginning of recorded time through the year 2003.1 The preponderance of data being created is of the unstructured variety -- up to about 90%, according to the IDC.

Big data is about being able to not just capture a wide variety of unstructured data, but to also capturing that data and combining it with other data to gain new insights that can be used in many ways to improve business performance. For Instance, in retail, it could mean delivering faster and better services to customers; in research, it could mean conducting tests over much wider sampling sizes; in healthcare, it could mean faster and more accurate diagnoses of illnesses.

The ways in which big data will change our lives is significant, and just beginning to reveal itself for those who are willing to capture, combine, and discover answers to their Big Questions. For big data to deliver on the promise of its vast potential, however, technology must be in place to enable organizations to capture and store massive amounts of unstructured data in its native format. That’s where Hadoop has become one of the enabling data processing technologies for big data analytics. Hadoop allows for dramatically bigger business questions to be answered, that we are already starting to see realized from large public cloud companies, which will shortly infiltrate into other IT oriented industries and services.

More than 50% of participating companies have begun implementing the available Hadoop frameworks as data hubs or auxiliary data repositories to their existing infrastructures, according to Intel’s 2013 IT Manager’s Survey on How Organizations are Using Big Data. In addition, 31% more organizations reported evaluating one of open-source Apache Hadoop framework.

So what are the key characteristics IT professionals should know about Hadoop in order to maximize its potential in managing unstructured data and advancing the cause of big data analytics? Here are five to keep in mind:

    1. Hadoop is economical. As an open-source software framework, Hadoop runs on standard servers. Hardware can be added or swapped in or out of a cluster, and operational costs are relatively low because the software is common across the infrastructure, requiring little tuning for each physical server.

 

    1. Hadoop provides an efficient framework for processing large sets of data. MapReduce is the software programming framework in the Hadoop stack. Simply put, rather than moving data across a network to be processed, MapReduce provides a framework to move the processing software to the data.3 In addition to simplifying the processing of big data sets, MapReduce also provides programmers with a common method of defining and orchestrating complex processing tasks across clusters of computers.

 

    1. Hadoop supports your existing database and analytics infrastructures, and does not displace it. Hadoop can handle data sets and tasks that can be a problem for legacy databases. In big data environments, you want to make sure that the underlying storage and infrastructure platform for the database is capable of handling the capacity and speed of big data initiatives, particularly for mission-critical applications. Because of this capacity it can and has been implemented as a replacement to existing infrastructures, but only where it fits the business need or advantage

 

    1. Hadoop will provide the best value where it is implemented with the right infrastructure. The Hadoop framework typically runs on mainstream standard servers using common Intel® server hardware. Newer servers with the latest Intel® computing, larger memory footprint, and more cache will typically provide better performance. In addition, Hadoop will perform better with faster in node storage, so systems should contain some amount of solid-state storage. In addition, the storage infrastructure should be optimized with the latest advances in automated tiering, deduplication, compression, encryption, erasure coding and thin provisioning. When Hadoop has scaled to encompass larger datasets it benefits from faster networks, so then 10Gb Ethernet rather than typical 1GbE bandwidth provides further benefit.

 

    1. Hadoop is supported by a large and active ecosystem. Big data is a big opportunity, not just for those using it to deliver competitive advantage, but also to those providing solutions. A large and active ecosystem has developed quickly around Hadoop, as it usually does around open-source solutions. As an example, Intel recently invested $740 million dollars into the leading distribution for Hadoop provided by Cloudera. Vendors are available to provide all or part of the Hadoop stack, including management software, third-party applications and a wide range of other tools to help simplify the deployment of Hadoop.



Unstructured data is growing nonstop across a variety of applications, in a wide range of formats. Those companies that are best able to harness it and use it for competitive advantage are seeing significant results and benefits. That’s why more than 80% of the companies surveyed by Intel are using, implementing or evaluating Hadoop.

Continue reading
1613 Hits
0 Comments