[computers, ethics, libraries] AI Image Tagging & The Google Divide

Nov 13, 2006 09:21

Researchers teach computers how to name images by 'thinking'

http://live.psu.edu/story/20538

Wednesday, November 1, 2006

University Park, Pa. -- Penn State researchers have "taught" computers how to interpret images using a vocabulary of up to 330 English words, so that a computer can describe a photograph of two polo players, for instance, as "sport," "people," "horse," "polo."

The new system, which can automatically annotate entire online collections of photographs as they are uploaded, means significant time-savings for the millions of Internet users who now manually tag or identify their images. It also facilitates retrieval of images through the use of search terms, said James Wang, associate professor in the Penn State College of Information Sciences and Technology, and one of the technology's two inventors.

The system is described in a paper, "Real-Time Computerized Annotation of Pictures," given at the recent ACM Multimedia 2006 conference in Santa Barbara, Calif., and authored by Jia Li, associate professor, Department of Statistics, and Wang. Penn State has filed a provisional patent application on the invention.

Major search engines currently rely upon uploaded tags of text to describe images. While many collections are annotated, many are not. The result: Images without text tags are not accessible to Web searchers. Because it provides text tags, the ALIPR system -- Automatic Linguistic Indexing of Pictures-Real Time -- makes those images visible to Web users.

ALIPR does this by analyzing the pixel content of images and comparing that against a stored knowledge base of the pixel content of tens of thousands of image examples. The computer then suggests a list of 15 possible annotations or words for the image.

"By inputting tens of thousands of images, we have trained computers to recognize certain objects and concepts and automatically annotate those new or unseen images," Wang said. "More than half the time, the computer's first tag out of the top 15 tags is correct."

In addition, for 98 percent of images tested, the system has provided at least one correct annotation in the top 15 selected words. The system, which completes the annotation in about 1.4 seconds, also can be applied to other domains such as art collections, satellite imaging and pathology slides, Wang said.

The new system builds on the authors' previous invention, ALIP, which also analyzes image content. But unlike ALIP which characterized images by incorporating computational-intensive spatial modeling, ALIPR characterizes images by modeling distributions of color and texture.

The researchers acknowledge computers trained with their algorithms have difficulties when photos are fuzzy or have low contrast or resolution; when objects are shown only partially; and when the angle used by the photographer presents an image in a way that is different than how the computer was trained on the object. Adding more training images as well as improving the training process may reduce these limitations -- future areas of research.

A demonstration of the ALIPR system can be found at http://www.alipr.com online.

In a companion paper also presented at the ACM conference, the researchers describe another of their systems-one that can use annotations in a retrieval process. This new system leverages annotations from different sources, human and computer. The researchers, who have built a prototype of the system, are working on testing it in real-world situations. That paper, "Toward Bridging the Annotation-Retrieval Gap in Image Search by a Generative Modeling Approach," was authored by Ritendra Datta and Weina Ge, Ph.D. students in computer science and engineering; Li; and Wang.

"Our approach aims at making all pictures on the Internet visible to the users of search engines," Wang said.

Research on both systems was supported by the National Science Foundation.

The Google Divide: Those who inherit it should design the future
John N. Berry III

http://www.libraryjournal.com/article/CA6379527.html

October 15, 2006

Frustration with one another is often the primary tie connecting those library staffers whose careers began in the “pre-Google” era to the “post-Google” newcomers. I realized that during dinner conversation with an excited group of young leaders of the library digital revolution who traded stories from their work.

It took me a few minutes to figure out why the anecdotes from these young change agents sounded so familiar, even to someone whose career began long before computers. This generation of developing leaders finds the revolution in creative library applications of technology, but their complaints about those who resist the future are nearly identical to those in the precomputer prehistory of librarianship, when I started. That was so long ago that my cohorts and I forget that awful frustration. It has always been part of library life.

Today, like yesterday, the frustration camps out in every kind of workplace in our field: from the faculty of LIS programs where tenured traditionals hold off the digital drive from the new Library 2.0 students to the smallest public libraries where young digital innovators try to bring change to a staff and public who still view the role and purpose of the library and librarians much as Dewey did when he first called it a profession. It was there when Jim Welbourne rallied library school students to a Congress for Change in the mid-Sixties. It was there when Pat Schuman and a band of young librarians convinced the American Library Association to oppose the Vietnam War. It was even there when the young Charlie Robinson first shouted, “Give ’em what they want,” and when Fred Kilgour and Hugh Atkinson made shared cataloging a reality with their new OCLC.

As a profession, we have always given vehement lip service to innovation and creativity, to change and progress. So those who were and are new to our profession were and are always surprised in their first, second, or even third jobs to find deeply rooted resistance to new ideas and innovation. Those who made it to top spots, most of them after years of struggle against the same resistance to change, either forgot their frustrations on the way up or want newcomers to go through the same pain they did. It sometimes looks like an initiation rite, since there is rarely any right or wrong in this struggle. Oh, sure, sometimes the new ideas are really crackpot schemes. Sometimes there is a genuine shortage of resources, and a library can’t afford the time and money to make the innovation work. Once in a while change even brings negative criticism that could do political damage.

More often, though, resistance to change is based on fear. Some are wary of reactions from the governing authority and public that rarely materialize. Others imagine their own obsolescence. (When you’re my age, you’ve probably experienced your own obsolescence several times.)

There is fear on both sides of the Google gap. Many library staff are apprehensive about the rules that stifle innovation by spelling out who can go ahead with an idea or speak out about the library in the new channels the digital world has made available to all.

My dinner companions told of a librarian who started a MySpace site for the library without asking permission. There were lots of other tales of librarians saying things on blogs and wikis that older staff deemed “inappropriate.” I wish I was wise enough to suggest a way to bridge the Google gap, but I’m not sure we can. I believe that we can survive the transition. Maybe we can even try to understand each other and move the best stuff on both sides across that divide.

From here, it looks like the future of libraries and librarians is on the post-Google side. So I suggest that those of us on the pre-Google side listen carefully to the newcomers. After all, it is their future, so they should be given the autonomy and support to begin to design it right now.

technology--computers, 2006november, libraries, ethics

Previous post Next post
Up