Archive for the 'Google' Category

Google, Yahoo spiders can now crawl through Flash sites

As anyone who has had the pleasure of doing web design and development through marketing agencies knows, Flash tends to be wildly popular among clients and wildly unpopular among, well, pretty much everyone else. Part of the reason for this is because Flash is so inherently un-Googleable; anything that goes into a Flash-only site is basically invisible to search engines and therefore, the world. That will no longer be the case, however, as Adobe announced today that it has teamed up with Google and Yahoo to make Flash files indexable by search engines.

This announcement has been a long time coming, as Flash developers have been wishing for ways to make their content searchable for close to a decade. Adobe acknowledges this in its announcement, saying that although search engines are able to index static text and links within Flash SWF files, “[Rich Internet Applications] and dynamic Web content have been generally difficult to fully expose to search engines because of their changing states—a problem also inherent in other RIA technologies.”

This announcement may also result in some major usability changes (for the better) for Flash on the web. In a post to its Webmaster Central Blog, Google wrote that it can now index all kinds of textual content in SWF files, like that included in Flash gadgets, buttons, menus, entirely self-contained Flash web sites, “and everything in between.” Google can now also follow URLs embedded within Flash files to add to the crawling pipeline. This new indexing technology does not, however, include FLV files (video files that are found on sites like YouTube) because those are generated as videos and don’t contain any text elements like an SWF file does.

Google says it’s able to do this by developing an algorithm that “explores Flash files in the same way that a person would,” by clicking buttons and manually going through Flash content. “Our algorithm remembers all of the text that it encounters along the way, and that content is then available to be indexed,” wrote the company. “We can’t tell you all of the proprietary details, but we can tell you that the algorithm’s effectiveness was improved by utilizing Adobe’s new Searchable SWF library.”

Of course, Google (and eventually Yahoo) won’t be able to index everything embedded within a Flash file—at least not yet. Anything that is image-related, including text that is embedded into images, will be invisible to the search engines for the time being. Google also noted that it can’t execute certain JavaScripts that may be embedded into a Flash file, and that while it indexes content that is contained in a separate HTML or XML file, it won’t be counted as part of the content in the Flash file. These are all issues that are being worked on, however, and are likely to change in the future.

Yahoo is also working with Adobe to index SWF files, but doesn’t appear to be as far along as Google just yet. One player that is noticeably missing is Microsoft, though. From Adobe’s announcement and the language used by Google, it appears as if each search engine has to work with Adobe to make this possible—meaning that Microsoft has either been excluded by Adobe for this round or has decided to voluntarily sit this one out. Either way, with searchable SWF files down, usability experts can now focus all of their attention on other Flash-related concerns, like blatant design perversion and excessive animation abuse.

Read more »

YouTube declines request to remove terrorist-produced videos

YouTube has refused a request from U.S. Sen. Joe Lieberman (ID-Conn.) to remove all videos sponsored by terrorist organizations like Al-Qaeda, contending that most of them don’t violate its community guidelines.

Lieberman, chairman of the Senate Committee on Homeland Security and Governmental Affairs, Monday called on the Google Inc. subsidiary to remove video content produced by terrorist organizations that showed assassinations, deaths of U.S. soldiers and civilians, weapons training, “incendiary” speeches and other material intended to “encourage violence against the West.”

“Islamist terrorist organizations use YouTube to disseminate their propaganda, enlist followers, and provide weapons training,” Lieberman said in a letter to Google CEO Eric Schmidt. “YouTube also, unwittingly, permits Islamist terrorist groups to maintain an active, pervasive, and amplified voice, despite military setbacks or successful operations by the law enforcement and intelligence communities.”

In the letter, Lieberman noted that while YouTube posts community guidelines for its users, it does not appear that the company follows the guidelines. For example, he noted that despite rules that prohibit gratuitous violence on the site, there are videos of Al-Qaeda attacks on U.S. forces in which some soldiers are killed or injured.

When contacted, Google pointed to a YouTube blog post that said the company has removed some of the videos cited by Lieberman, primarily because they depicted gratuitous violence, advocated violence or used hate speech. However, the post also noted that most of the videos in question remain on the site “because they do not violate our community guidelines.”

“Hundreds of thousands of videos are uploaded to YouTube every day,” the YouTube blog post said. “Because it is not possible to pre-screen this much content, we have developed an innovative and reliable community policing system that involves our users in helping us enforce YouTube’s standards. Millions of users report potential violations of our community guidelines.”

YouTube went on to say that it encourages free speech and defends the right of its users to express unpopular points of view

“We believe that YouTube is a richer and more relevant platform for users precisely because it hosts a diverse range of views, and rather than stifle debate we allow our users to view all acceptable content and make up their own minds,” the company said. “Of course, users are always free to express their disagreement with a particular video on the site, by leaving comments or their own response video. That debate is healthy.”

Mark Hopkins, a blogger for Mashable, noted that YouTube has been “capricious and arbitrary” in deciding what content promotes hate speech or violence and should be removed. For example, he pointed out that YouTube took down a video showing victims of a Muslim terrorist attack, but allowed videos of homeless people who were paid to beat each other. A video of clothed females in Hong Kong with derogatory music towards women being played in the background was removed, while a video of a strip tease with nudity was allowed to remain on the site, he noted.

“[Lieberman's] primary concerns weren’t the usual suspects when you think of the things that American politicians find objectionable (rap music, graphic portrayals of violence, Grand Theft Auto and Janet Jackson’s nipple),” Hopkins noted.

Instead Lieberman brought up a topic that YouTube should be called on — allowing itself to be a participant in the dissemination of propaganda videos produced by Islamic terrorist organizations, Hopkins said. “If YouTube can spend millions enforcing DMCA and piracy concerns, they can take a few minutes and respond to valid citizen complaints against usage of the system to promote terrorist organizations,” he added.

Read more »

YouTube pays users $1 million

YouTube said today it has paid out more than $1 million to its user partners through its partner program. The figure came as part of an announcement that YouTube is expanding the program to users in Japan, Australia and Ireland (it was previously only available in the United States, Canada, and the United Kingdom).

YouTube doesn’t disclose how it splits its revenue, but we’ll make do with what scraps of numbers we have. The site currently lists 100 partners, though that also includes entities that we’d think would be designated as professional partners rather than “user partners,” such as Universal Music Group and CBS.

Break a Leg’s Yuri Baranovsky said he’d collected $1,600 for more than 2 million views on YouTube. So if that means $800 for a million views (which it doesn’t exactly, but just to get an idea), user partners have been responsible for 1.25 billion paid views so far.

Users complained after YouTube was bought for $1.65 billion by Google in October 2006 that they weren’t being rewarded for their own hard work to make the site what it was. So OK, this math is a little unfair, but if you divide that out, users have now earned about .06 percent the purchase price. Thanks a lot, Chad and Steve!

If you want more current numbers, Bear Stearns estimated that YouTube would pull in $90.2 million in domestic revenue and $13.8 million in international revenue this year, with the vast majority of that coming from banner ads displayed next to videos. YouTube partner videos are the only ones on the site for which YouTube shows overlay ads, which it says it tries to sell for a $20 CPM. Bear Stearns said it expected $22.6 million in overlay ad revenue domestically this year.

Revver, the OG video rev-share site, hit the $1 million-paid-to-users mark first, in September 2007. The site was later bought by LiveUniverse for about $5 million.

Read more »

Googlebot crawls through HTML forms

Google will stop at nothing in its quest to index the world’s information. Last year it ate through 100 exabytes of data, but there’s still a lot that it can’t get access to. Known as the deep web (or hidden web, or invisibe web, etc.), it is estimated that the majority of online data is hidden safely from Google’s prying eyes — private intranets, unlinked pages, some non-textual content, and until today dynamic content returned via form input was all inaccessible to the search engine. Google today announced that its Googlebot web crawler would begin to fill out HTML forms and crawl the results.

“For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made,” explained Jayant Madhavan and Alon Halevy in a blog post. “If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page.”

Google, which says that the crawling of dynamic form results doesn’t affect the “crawling, ranking, or selection of other web pages in any significant way,” also assured webmasters today that their enhanced crawl would respect robots.txt as usual. Any form forbidden in robots.txt won’t be crawled.

It is estimated that the deep web is several orders of magnitude larger than the regular, public world wide web. While there is some content that Google will never — and should never — get its hands on, by crawling form results Google is now peering just a little bit deeper into the Internet. As Matt Cutts points out, this is less about indexing search results (something Google has generally not liked to do) and more about finding new links that are only available via dynamically created pages.

It should be noted that Google is only crawling GET forms (i.e., forms used to retrieve dynamic content, such as search results) and not POST forms. That’s mildly disappointing as we were looking forward to befriending Googlebot on MySpace…

Read more »

Pentagon bans Google map-makers

The US defence department has banned the giant internet search engine Google from filming inside and making detailed studies of US military bases.

Close-up, ground-level imagery of US military sites posed a “potential threat” to security, it said.

The move follows the discovery of images of the Fort Sam Houston army base in Texas on Google Maps.

A Google spokesman said that where the US military had expressed concerns, images had been removed.

Google has now been barred from filming and conducting detailed studies of bases, following the discovery of detailed, three-dimensional panoramas online - and in particular, views of the Texan base.

“Images include 360-degree views of the covered area to include access control points, barriers, headquarters, facilities and community areas,” said the defence department in a statement quoted by AFP news agency.

It said such detailed mapping could pose a threat.

Google spokesman Larry Yu said the decision by a Google team to enter the Texas base, which is in San Antonio, and undertake a detailed survey, had been “a mistake”.

He told the BBC that it was “not our policy to request access to military installations, but in this instance the operator of the vehicle with the camera on top - which is how we go about capturing imagery for Street-View - requested permission to access a military installation, was given access, and after learning of the incident we quickly removed the imagery”.

Individuals and governments
Military officials are currently looking into exactly what imagery is available - though it may not be able to order its removal if images are taken from public streets.

Among the popular mapping services offered by Google are Street View, which allows web users to “drive” along virtual US landscapes with ground-level views, and Google Earth, which offers detailed satellite and 3D images of locations around the world.

In this case, it was imagery offered on Street View that caused the concern.

But both have provoked complaints - from individuals depicted in the images and from governments concerned that satellite images could compromise security.

Gary Ross, a spokesman for the US Northern Command, told AFP that although such services could be useful, “there has to be a balance”.

But Mr Yu said Google would listen to concerns about privacy and security.

“We try to have a compliant image removal policy - not only relative to the military but to consumers also,” said Mr Yu.

“If people have concerns, they should contact us.”

Read more »

Next Page »