Ideas for recognizing texts and virtual reality

Oct 11, 2017 03:45


Participated in one competition, but did not win prizes. To good not to lose publish.

Essence:

Recognition of the meaning of the contents of screen screens and video recordings in image capture and control systems, further systematization of images by semantic tags: for structuring and analyzing the use of the workplace.

Description:

There are systems for capturing (recording) images from monitor screens to monitor the work of employees. Currently, the control is carried out by direct routine inspection of photo-video records, while the introduction of the text recognition system and images it is possible to accurately determine the meaning of the content of images, identify the semantic tags and systematize, perform searches, analyze analytics in different sections, give advice on the adoption management decisions.

Problem:

In existing systems of capture (recording) images from the monitor screens to control the work of employees, monitoring is performed by the direct routine inspection of photo-video records, while introducing the system of text recognition and images it is possible to accurately determine the meaning of the content of images, to identify semantic tags and systematize , produce analytics in various sections, give advice on the adoption of management decisions.

The target audience:

Organizations that use job control systems. Manufacturers of software to capture the contents of monitor screens.

Uniqueness of the idea:

Direct analogs could not be found. Similar in meaning to ABBYY Screenshot Reader and the like.

Essence:

A simple recognition of text areas that can not be selected and copied: a simplified and cheaper analogue of ABBYY Screenshot Reader.

Description:

In some cases, it becomes necessary to copy text that can not be copied: for example, because of the use of special copy protection, in some types of documents (for example, parts of PDF documents), inscriptions window, etc.

In these cases, you can use the recognition of interesting fragments, which could be distinguished in different ways: by a frame or even by a clever contour that could be magnetized to the shape of the lines. The function can be called through the context menu or by hotkeys, through applications to browsers, or by calling the program interface.

Unlike ABBYY Screenshot Reader, only the highlight and copy function is sufficient, but easily called in various ways. Accordingly, the program can be reduced in price several times, which will provide more mass use. In addition, the cheap analogue can serve as a channel for the promotion of ABBYY Screenshot Reader.

Problem:

In some cases, it becomes necessary to copy text that can not be copied: for example, because of the use of special copy protection, in some types of documents (for example, parts of PDF documents), inscriptions window, etc. The existing analogue of ABBYY Screenshot Reader is simultaneously excessively functional and, accordingly, is excessively expensive.

The target audience:

Any PC user and Internet.

Uniqueness of the idea:

ABBYY Screenshot Reader

Analytics of street advertising

Essence:

Recognition of objects and senses on static and dynamic images of varying complexity in order to determine the purpose of the object / objects and the general meaning of the depicted situation (scene) on the totality of objects of texts, speech and texts; Converting information to a text-semantic form, convenient for systematization and retrieval.

Description:

Modern search engines perform a full search on text resources in general, while an increasing number of media files remain without analysis of internal content, which is quite possible for the complex recognition of any text, graphics, video and audio information to identify the meaning of scenes and scenarios in context of general meaning and the accompanying external text.

Examples of using:

As part of the search engine for images, which can be both a separate search service, and used by existing search engines;

Recognize the meaning of video episodes (for example, on Youtube): recognize speech and video, in the timeline of the video, create more accurate subtitles that take into account the meaning (context) of the situation depicted in the video, creating tags (hashtags) for each scene. This information can be used in search engines that will not be able to give a link to the video as a whole, but to specific fragments of the video, displaying the level of relevance of the fragment. It may look like this: in the search results for the video series, lists of thumbnails in the form of a horizontal row (tape), most characterizing the video fragment in accordance with the search criteria, and output relevance (the degree of correspondence) of episodes to the search query. This can be done in the form of a curve of the level of compliance on the timeline of the video. When hovering on any part of the tape, you can display text from the video in the pop-up frame as an accompanying information. When you click on the video source link, display the epizdor tape in a separate window with the details: expand the tape vertically to the left, and on the right - the accompanying text.

There will be an opportunity to comment not on the whole video, but on episodes in the blog form. To do this, you can create a separate aggregating resource like the Discus engine, extending the capabilities of traditional Youtube and similar to it.

3. When recognizing a situation (scene) on images, one can determine their semantic and aesthetic value, as well as dangerous images (with obscene language, scenes of pornography, violence, etc.). These data can be used for various filters of dangerous content: both third-party (for example, ADBlock) and own. In the general video sequence, episodes with an age limit can be automatically recognized and marked.

4. In applications to social networks, blogs, etc.: when publishing a post with images in addition to the text (description), which may not be enough, you can analyze the image. For example, the groups in the VC "Give it away," which are very popular. Currently, subscribers of such groups have to manually check the appearance of new things or respond to notifications about each post, or use applications that analyze the accompanying text posts: the user in the settings of the application enters a list of interesting things, and the application promptly notifies the user about the appearance of this thing based on analysis of only text, which may or may not be or is not enough. This process can be fully automated and made much more accurate if in addition to the accompanying text, analyze and photographs of things.

Problems:

Absence of complex recognition of media resources and use of text resources in search engines.

The lack of an opportunity to discuss not the video file as a whole, but its individual episodes and scenes.

Absence of full protection against improper media content and individual video sequences.

Absence of mechanisms for a full analysis of posts containing media information.

The target audience:

Internet users, subscribers of social networks, blogs, etc., search engines.

Essence:

Recognition of price tags, checks and appearance of the product for product identification and location, output of information about products and organizations on maps and in augmented reality: in addition and to simplify technologies and services for barcode recognition for goods identification.

Description:

There are product identification services by barcode, for example Rate & Goods, using their own product database and / or GS1 association data. These services allow you to give reviews and rating products, like goods on the Yandex.Market. Relatively recently, the services began to work excellently, but so far little has been untwisted. However, many types of goods can not be accurately identified: they are not in bases or they are packaged in retail outlets without global barcodes. Nevertheless, thanks to the recognition technologies not only of bar codes, but also of the contents of price tags, and even identification by appearance of the product via a smartphone or augmented reality glasses, configuration and filling the trading floor, it is possible to identify the product and its location in a much more accurate and high probability trading premises, theoretically without the need to read the barcode - that is, at a distance. Due to this combination of information about the location of the product, its appearance, the statistics of the identification of goods of previous buyers is enough to identify the goods and display on the screen over the image of goods information about them by means of a smartphone or augmented reality glasses at a sufficiently large distance (up to several meters) : rating (with the ability to call reviews), average and minimum price, an estimate of the accuracy of identification (for example, in the form of a color scale or a geometric figure - the higher the accuracy, the greener / brighter the scale / figure). This will allow you to choose the goods much faster, to protect the buyer from low-quality goods, various manipulations and deceptions of marketers: for example, shares where instead of the old high price there is a new understated, but in fact the average or even the minimum price of the product in the region in other outlets is not much different from low. The same system will allow you to quickly find in the outlets a quality cheap product and indicate it in augmented reality, as well as to warn about substandard.

In many stores it is difficult to compare the product and the price tag, while the price of the goods is public information. By identifying the barcode and even the appearance of the goods, and at its location, you can "break through" the price of the goods and bring it out in augmented reality over the product image, without the need to search for a price tag.

The system can work as a stand-alone service, have an API for third-party services like Yandex.Market, Rate & Goods, Foodid (data on promotions and discounts), and ideally combine the data of these services into a single multi-service: let see (on the map and in the store ) and find interesting products of appropriate quality at lower prices, that is, save the budget and protect against a poor-quality product. You can easily calculate the average benefit for the average person from using this service and use it to promote the service. In addition to displaying the product data on the map of the settlement (as well as the layout of the shopping center), you can create a function of subscription (notifications) for goods with low prices and high ratings, located on the regular route of the user (for example, home-work) or passing by of this product inside the retail space - since it is likely to be inconvenient to constantly look through the screen of a smartphone or even on a trading room card - unless, of course, it's not about glasses of augmented reality.

Unlike existing services, a more interactive system will motivate buyers to leave more active and more efficient assessments and reviews. Perhaps it will be justified even to reward those who leave feedback: for example, to pay rubles 5 for a tip to a mobile phone account, and 50 cents for an evaluation. These rates can vary depending on the total number of reviews for this position: the first to leave the review, pay the most reward, and then downward.

In the absence of a universal barcode on the product and to simplify the recall process, the system must have the function of check recognition and / or the ability to obtain detailed information on the list of purchased goods from payment systems: in this case it will be easy to call up a review page for this position and leave a rating / .

The system is universal and, with some amendments, can be used throughout the world, where consumption volumes and revenues from contextual advertising are orders of magnitude higher than in the Russian Federation.

This system motivates manufacturers and sellers of quality products to use technologies to simplify indoor navigation and improve the quality of service, make maximum use of global barcodes for product labeling, create conditions for fair competition of outlets, which even more actualizes the system itself.

The application can recognize and locate products within the store by the type of counters and their contents, serve as a navigator in determining the location of the goods.

In addition, there is a need to display the rating of the product when visiting various online stores, which do not provide a rating function - own or from Yandex.Market. For browsers, this is decided by installing various applications (modifying the code of the downloaded web pages), which is difficult and not universal for many users. A much more universal and convenient way would be to use the application for mobile and fixed devices that could recognize the product directly on top of the screen content, and identify the product using the brand, article, URL, even if necessary image; Over the content of the screen, emulate the layer and display the necessary information about the product next to it in the free space. The application will be able to set the reference points of the image, and when scrolling the page, or changing the scale and angle of rotation, synchronously move the information about the product in the enamel layer. This screen image recognition application and the emulated layer displaying information about them can be used not only in store windows, but also output additional information on top of maps (for example, 2Gis) with reference to reference points of geo-objects.

Problems:

The impossibility of identifying certain types of goods by global bar codes in order to learn about the product rating and reviews;

Inconvenience of the identification of goods solely on bar codes;

Separate services for working with goods and trade points and the complexity of working with them (especially for older people and children);

Low activity of buyers in affixing ratings and reviews, and, accordingly, low information efficiency and insufficient statistical base and objectivity of ratings;

The complexity of the search for goods within commercial premises;

The complexity and inconvenience of using a lot of add-ons and "applications to applications" in order to display more information on product information on sites and maps;

Unfair marketing of outlets in the absence of sufficient conditions for their rapid comparison on the quality of the goods;

Inadequate level of protection from poor-quality goods in the process of shopping and navigation on a quality product along the route in the village and inside the shopping center;

Insufficient efficiency and efficiency of ways to save a buyer's money in the process of searching for a lot of inexpensive goods and the shopping process, when the time for making a decision about choosing a particular product in a store is limited and also has a price - it is necessary to "grab" rather than choose and "punch" on the basis of "each product.

The target audience:

Any buyers of virtually any stores, including online stores.

Uniqueness of the idea:

In aggregate, Yandex.Market, Rate & Goods, Eating, etc.

Shopping malls have applications and sites with catalogs of goods, novelties and stocks. Without communication with the application frame, it is difficult to identify the product and output an independent product rating. It's easier with websites because you can use applications to browsers that modify the code of downloaded web pages, but in this case, as an additional means of more accurate identification of goods, you can use the product image. In order to identify the product in applications for mobile phones, you can use the application to recognize the contents of the frames of any application, especially the product catalogs - to recognize the article, description, photo. And in the future to impose a rating (and other necessary information) on top of the frame as if in a separate "emulated" layer. Set the reference points in the frame, and if you change the contents of the frame as a result of scrolling or zooming in, in addition to directly processing these events, to more accurately position the overlaps, adjust to the reference points.

Similar recognition can be done on top of maps, emulating a layer: the coordinate and scale can be taken from a reference or by recognizing the names of streets and houses, and more precise positioning by reference points of geo-objects.

Essence:

Placement in the augmented reality of the most targeted and contextual advertising, overlapping traditional advertising of ordinary reality, while for viewing advertising in augmented reality, the user will receive money, and the advertising agency (platform) - commission.

Description:

In ordinary reality, advertising is not targeted, advertising spots are paid to landlords of advertising spaces and compensated for by increasing the cost of products and services of advertisers. In virtual and augmented reality, the owner of the place is the user, therefore he must receive payment for advertising. In order not to create additional information load on the user, advertising in augmented reality must overlap with advertising of ordinary reality, in addition, it can take into account the needs of the user and the context (current situation), that is, to be maximally targeted.

In this case, advertising in augmented reality will be much more efficient and cheaper for advertisers, which will significantly increase their competitive advantages and reduce advertising costs, which will positively affect the cost of products.

In addition, the possible interaction of advertisers (usually the manufacturers and distributors) and trading platforms, so you can keep track of the relationship to view the product advertising / services and the purchase of this product / service user - you can more accurately analyze the effectiveness of advertising, and encourage the user to a variety of bonuses, discounts, and so on. P. In some cases, advertising blocks will be displayed altered sizes - more or less. Accordingly, it will change and amount of compensation for impressions.

Information System, to advertise in augmented reality on top of the traditional (or independently of it), will receive a commission in the same way as it does with traditional contextual advertising.

Man has the right to disable ad serving time, or use filters, as well as cut off the traditional advertising is similar to plug-ins for ADBlock and Adguard types of browsers.

For the identification of advertising in traditional advertising realities require recognition algorithms and / or information about the location of advertising. This information can indicate the user, in the same way as the ADBlock can help to identify and allocate ad units of web-pages. In addition to advertising in the augmented reality filter will not be able to pass a variety of unwanted information from the ordinary reality: for example, graffiti, lettering, mute the light is too bright and strengthen pale and low-contrast objects.

Problem:

In ordinary reality advertising is not a target, pay advertising space goes to landlords advertising spots and compensated for by increasing the value of products and services of advertisers. This non-targeted advertising as effective, annoying people, most of it does not affect positively on the lives of people. Affect the advertising man can not.

The target audience:

Any users.

The uniqueness of the idea:

Windows Holographic, development companies and Matterport Interactive Advertising Bureau

Essence:

The augmented reality selection in any way (contour lights) recognized objects (markers), search object information: partial information output, by pressing the marker output more information.

Description:

It is not so much the idea, and offer a fairly simple method for implementing VR, since such a system already exists. The augmented reality is continually recognize objects and their identification with regard to their actual location on the map and in relation to other objects and contexts (situations). Easily recognized buildings, inscriptions, easily recognizable objects with distinct characteristics.

If the object is in focus or view it "clicked" - side to bring a range of possible actions (options depend on the type of object): the main action - the search for more complete information; other actions: Only select similar objects and search for information on similar objects; Add to favorites; specify the owner of the object (e.g., itself); Importance; Ignore (Importance = 0); Manipulation with the object of other services (possibly paid); Paid services platform; Send information about the object to another user; Comment on the object / event, thus creating a blog object or joining an existing (including branches of the blog, including chronological); and others.

To speed up the search for information on the facilities it can be uploaded in advance on interesting objects in the field of vision and in the potential field of vision, taking into account the likely move. Information can be downloaded locally fullest, and be subscribed to change the information on the Internet (or at the same time the platform database). The reverse process of replenishment database platform updated local data.

This same idea can include changing the shape of objects, for example, for the purpose of safely entering PIN codes: in Augmented Reality can be inferred "calculator" with blended random numbers (not consecutive), and the user by typing the code in ordinary reality It performs quite erratic action, which are useless to an external observer.

Problem:

Specifically, in such an implementation of augmented reality I did not meet the descriptions.

The target audience:

Members VR

The uniqueness of the idea:

Partially Daydream VR, browser Layar, AR Kit (one of those that I know of).

ocr, augmented reality

Previous post Next post
Up