gesture recognition

 gesture recognition

Gesture recognition is a subject in computer science and language technology that aims to interpret human gestures through mathematical algorithms . [1] It is a sub-discipline of computer vision . Gestures can originate from any physical movement or position but are usually generated from the face or hands . Present [ when? ] Area focuses include emotion recognition from face and hand gesture recognition, Users can use simple gestures to control or interact with devices without physically touching them. Multiple perspectives have been created using cameras andComputer vision algorithms for interpreting sign language . However, recognition and recognition of gestures, gait, proxemics and human behavior are also the subject of gesture recognition techniques. [2] Gesture recognition can be seen as a way for computers to understand human body language , thus creating primitive text user interfaces or even GUIs .(Graphical User Interface) form a rich bridge between machines and humans, which still limit most to keyboard and mouse input and interact naturally without any mechanical tools. Using the concept of gesture recognition, it is possible to point a finger at a point it will move accordingly. This can make traditional inputs on such devices even redundant.

not found

A child is being sensed by a simple gesture recognition algorithm that detects hand location and movement

not found

Gesture recognition is typically processed in middleware , with results transmitted to user applications.

Observation

Gesture Recognition Features:

  • more precise
  • high stability
  • Saving time to unlock the device

Currently the major application areas of gesture recognition [ When? ] The scenarios are:

  • Automotive sector
  • consumer electronics sector
  • transit area
  • gaming zone
  • to unlock smartphone
  • defense [3]
  • home automation
  • Automatic Sign Language Translation [4]

Gesture recognition can be conducted with techniques from computer vision and image processing . [5]

The literature includes ongoing work in the computer vision field on capturing gestures or more general human posture and movements by cameras connected to computers. [6] [7] [8] [9]

Gesture recognition and pen computing: Pen computing reduces the hardware impact of the system and also extends the range of physical world objects that can be used for control beyond traditional digital objects such as keyboards and mice. Such implementations may enable a new class of hardware that does not require a monitor. This idea could lead to the creation of holographic displays. The term gesture recognition has also been used to refer more narrowly to non-text-input handwriting symbols, such as inking on graphics tablets , multi-touch gestures, and mouse gesture recognition. It is computer interaction through the drawing of symbols with a pointing device cursor. [10] [11] [12]( see pen computing )

Gesture types

In computer interfaces, two types of gestures are distinguished: [13] We consider online gestures, which can also be considered as direct manipulations such as scaling and rotating. In contrast, offline gestures are usually processed after the conversation has ended; For example, a circle is drawn to activate a context menu .

  • Offline Gestures: Gestures that are processed after the user has interacted with the object. An example is the gesture to activate the menu.
  • Online Gestures: Direct Manipulation Gestures. They are used to scale or rotate a tangible object.

touchless interface

Touchless user interface is an emerging technology regarding gesture control. Touchless User Interface (TUI) is the process of commanding a computer through body motion and gestures without touching a keyboard, mouse, or screen. [14] In addition to gesture controls, touchless interfaces are becoming widely popular because they provide the ability to interact with the device without touching it.

Types of touchless technology

There are many devices that use this type of interface, such as smartphones, laptops, games, televisions, and music devices.

A one-of-a-kind touchless interface uses a smartphone's Bluetooth connectivity to activate the company's visitor management system. This prevents touching an interface during the COVID-19 pandemic. [15] [16]

input devices

The ability to track a person's movements and determine what gestures they are making can be achieved through a variety of tools. Kinetic User Interface (KUI) is an emerging user interface that allows users to interact with computing devices through the motion of objects and bodies. [ citation needed ] Examples of KUI include tangible user interfaces and motion-aware games such as the Wii and Microsoft's Kinect, and other interactive projects. [17]

Although a large amount of research has been done in image/video based gesture recognition, there is some variation between implementations in the devices used and the environment.

  • Wired Gloves. These can provide input to the computer about the position and rotation of the hands using magnetic or inertial tracking devices. In addition, some gloves can detect finger bending with high accuracy (5–10 degrees), or even provide haptic feedback to the user, simulating the sense of touch. The first commercially available hand-tracking glove-type device was the Dataglove, [18] a glove-type device that could detect hand position, movement, and finger bending. It uses a fiber optic cable running across the back of the hand. Light pulses are generated and when the fingers are bent, light leaks through the tiny slits and the loss is registered, thereby estimating the hand posture.
  • Depth aware cameras. Using specialized cameras such as structured light or time-of-flight cameras, one can produce a depth map of what is being seen through the camera at short distances, and use this data to interpret what is being seen. Approximates 3D representation. These can be effective in detecting hand gestures due to their short range capabilities. [19]
  • stereo cameras. Using two cameras whose relationship to each other is known, a 3D representation can be approximated by the outputs of the cameras. To obtain the relation of the cameras, one can use a positioning reference such as a lexion-stripe or an infrared emitter. [20] Gestures can be detected directly in combination with direct motion measurement (6D-vision).
  • Gesture-based controller. These controllers act as extensions of the body so that when gestures are performed, some of their motion can be easily captured by software. An example of emerging gesture-based motion capture is through skeletal hand tracking, which is being developed for virtual reality and augmented reality applications. An example of this technology is shown by tracking companies uSens and Gestigon, which allow users to interact without controllers around them. [21] [22]
  • Wi-Fi sensing [23]

Another example of this is mouse gesture tracking, where mouse motion is related to a symbol drawn by a person's hand which can study changes in acceleration over time. [24] [25] [26] The software also compensates for human vibration and unintentional movements. [27] [28] [29] The sensors in these smart light-emitting cubes can be used to sense hands and fingers, as well as other nearby objects, and can be used to process data. Could Most applications are in music and sound synthesis, [30] but can be applied in other areas.

Single Camera . A standard 2D camera can be used for gesture recognition where the resource/environment would not be convenient for other forms of image-based recognition. It was previously thought that a single camera might not be as effective as stereo or depth awareness cameras, but some companies are challenging this theory. Software-based gesture recognition technology that can detect strong hand gestures using a standard 2D camera.

algorithm

Various methods of tracking and analyzing gestures exist, and some basic layouts are shown in the figure above. For example, volumetric models give the essential information required for a detailed analysis, however they prove to be very intensive in terms of computational power and require further technical development to be applicable to real-time analysis. On the other hand, presence-based models are easier to process but generally lack the generality required for human-computer interaction.

Depending on the type of input data, the interpretation of gestures can be approached in different ways. However, most techniques rely on cardinal points represented in a 3D coordinate system. Depending on the relative motion of these, the quality of the input and the algorithm's approach, gestures can be detected with high accuracy.

To explain body movements, they have to be classified according to general properties and the message that the movements can convey. For example, in sign language each gesture represents a word or phrase.

Some literature distinguishes 2 different approaches in gesture recognition: one 3D model based and one presence-based. [31] The most important method uses 3D information of key elements of body parts to obtain several important parameters such as palm positions or angles of joints. On the other hand, appearance-based systems use images or videos to direct interpretation.

A real hand (left) is interpreted as a collection of vertices and lines in the 3D mesh version (right), and the software uses their relative positions and interactions to infer the gesture.

3D model-based algorithms

The 3D model approach may use volumetric or skeletal models, or even a combination of both. The volumetric approach has been used heavily in the computer animation industry and for computer vision purposes. Models are usually created from complex 3D surfaces, such as NURBS or polygon meshes.

The drawback of this method is that it is very computationally intensive, and systems for real-time analysis have yet to be developed. For the time being, a more interesting approach would be to map simple primitive objects to the person's most important body parts (e.g. cylinder for arms and neck, sphere for head) and analyze how these interact with each other. how to interact with Furthermore, some abstract structures such as super-quadrixes and generalized cylinders may be even more suitable for approximating body parts.

The skeletal version (right) effectively modeling the hand (left). It has fewer parameters than the volumetric version and is easier to compute, making it suitable for real-time gesture analysis systems.

skeleton based algorithm

Instead of using intensive processing of the 3D model and dealing with a lot of parameters, one can use a simplified version of the segment length as well as the joint angle parameters. This is known as skeletal representation of the body, where the virtual skeleton of the person is calculated and the parts of the body are mapped into certain segments. Here the analysis is done using the position and orientation of these segments and the relationship between each of them (eg the angle between the joints and the relative position or orientation)

Advantages of using skeletal model:

  • The algorithms are fast because only the key parameters are analyzed.
  • Pattern matching against template database possible
  • Using key points allows the detection program to focus on important parts of the body

These binary silhouette (left) or contour (right) images represent specific inputs for appearance-based algorithms. They are compared to various hand templates and if they match, the correspondent gesture is inferred.

presence-based model

These models no longer use a spatial representation of the body, as they obtain parameters directly from images or videos using a template database. Some are based on distorted 2D templates of human body parts, particularly hands. Deformation templates are sets of points on the outline of an object, which are used as interpolation nodes to approximate the outline of the object. One of the simplest interpolation functions is linear, which takes an average shape from the point set, the point variability parameter, and an external deformer. These template-based models are mostly used for hand-tracking, but can also be used for simple gesture classification.

A second approach to gesture detection using presence-based models uses image sequences as gesture templates. The parameters of this method are either the images themselves, or some features derived from them. Most of the time, only one (monoscopic) or two (sterescopic) views are used.

electromyography-based model

Electromyography (EMG) deals with the study of electrical signals produced by muscles in the body. Through classification of the data obtained from the hand muscles, it is possible to classify the action and thus input the gesture to an external software. [1] Consumer EMG devices allow non-invasive methods such as arm or leg bands, and connect via Bluetooth. Because of this, EMG has an advantage over visual methods as the user does not need to face the camera to give input, allowing greater freedom of movement.

the challenges

There are several challenges associated with the accuracy and usefulness of gesture recognition software. There are limitations on the tools used for image-based gesture recognition and on image noise. Images or videos may not be in consistent lighting, or may not be in the same location. Items in the background or distinctive features of users can make identification more difficult.

The variety of implementations for image-based gesture recognition can also pose problems for the technology's viability for general use. For example, an algorithm calibrated for one camera may not work for another. The amount of background noise also causes tracking and detection difficulties, especially when blockages (partial and complete) occur. In addition, distance from the camera, and the resolution and quality of the camera, also cause variation in detection accuracy.

To capture human gestures by visual sensors, robust computer vision methods are also needed, for example for hand tracking and hand gesture recognition [32] [33] [34] [35] [36] [37 ] ] [38] [39 ] [ 40] or to capture head movements, facial expressions or gaze direction.

social acceptability

A significant challenge to the adoption of gesture interfaces on consumer mobile devices such as smartphones and smartwatches arises from the social acceptability effect of gesture input. Although gestures can facilitate fast and accurate input on many new form-factor computers, their adoption and usefulness are often limited by social factors rather than technical ones. To this end, designers of gesture input methods may try to balance both technical considerations and the user's desire to perform gestures in various social contexts. [41] Furthermore, different device hardware and sensing mechanisms support different types of recognizable gestures.

mobile device

Gesture interfaces on mobile and small form-factor devices are often supported by the presence of motion sensors such as inertial measurement units (IMUs). On these devices, gesture sensing relies on users performing motion-based gestures that can be recognized by these motion sensors. This can potentially make capturing signals from subtle or low-motion gestures challenging, as they can be difficult to distinguish from natural movements or noise. Through a survey and study of gesture utility, the researchers found that gestures that involve subtle movements, that appear similar to existing technology, look or feel similar to each action, and that are pleasurable are considered are more likely to be accepted by users, whereas gestures that look awkward, are uncomfortable to perform, interfere with communication,[41] The social acceptability of mobile device gestures depends heavily on the naturalness of the gesture and the social context.

On-body and wearable computers

Wearable computers generally differ from traditional mobile devices in that their location of use and interaction is on the user's body. In these contexts, gesture interfaces may be preferred over traditional input methods, as their small size makes touch-screens or keyboards less attractive. Nevertheless, they share many of the same social acceptance barriers as mobile devices when it comes to gestural interaction. However, the potential for wearable computers to be hidden from sight or integrated into other everyday items, such as clothing, allows gesture input to mimic normal clothing interactions, such as adjusting a shirt collar or turning on one's face. Rubbing the pants pocket of. [42] [43]A major consideration for wearable computer interaction is device placement and space for interaction. A study exploring third-party attitudes toward wearable device interactions, conducted in the United States and South Korea, found differences in men's and women's perception of wearable computing use, partly due to the different areas of the body being socialized. considered sensitive. [43] Another study examining the social acceptability of body-projected interfaces found similar results, with both studies labeling the areas around the groin, groin, and upper body (for women) as the least acceptable. while areas around the forearm and wrist were considered the least acceptable. most acceptable. [44]

public establishment

Public installations, such as interactive public exhibits, allow access to information and display interactive media in public settings such as museums, galleries, and theaters. [45] While touch screens are a frequent form of input for public display, gesture interfaces offer additional benefits such as improved legibility, remote interaction, improved searchability, and can favor demonstrative interaction. [42] An important consideration for gestural interaction with public performance is the audience's high likelihood or expectation of audience participation. [45]

"Gorilla Hand"

The "gorilla arm" was a side effect of vertically oriented touch-screen or light-pen use. Over a period of prolonged use, users' hands began to feel fatigued and/or restless. This effect contributed to the decline of touch-screen input in the 1980s, despite its initial popularity. [46] [47]

To measure arm fatigue and the gorilla arm side effect, the researchers developed a technique called Consumed Endurance. [48] ​​[49]

Please leave your comment to encourage us

Previous Post Next Post