image
image
user-login
Patent search/

A METHOD OF SEMANTIC TRANSMISSION AND RECEPTION OF A VIDEO FRAME FOR STATIC ENVIRONMENTS

search

Patent Search in India

  • tick

    Extensive patent search conducted by a registered patent agent

  • tick

    Patent search done by experts in under 48hrs

₹999

₹399

Talk to expert

A METHOD OF SEMANTIC TRANSMISSION AND RECEPTION OF A VIDEO FRAME FOR STATIC ENVIRONMENTS

ORDINARY APPLICATION

Published

date

Filed on 12 November 2024

Abstract

The present invention relates to a method of semantic transmission (100) and reception (200) of a video frame for static environments. The method includes of semantic transmission of a video frame captured by a Wireless Sensor Network (WSN) node in a Static Environment Surveillance, SES and a method of semantic receiving of a video frame by a Wireless Sensor Network, WSN, node in a Static Environment Surveillance, SES. The invention uses an optimized wake-up frame format comprises a message type field identifying a message type of the wake-up frame, a protected field, an address field identifying an address of the wake-up frame, and a frame body field identifying data of the wake-up frame. Further, the video frame is reconstructed by placing the recovered region of interest at an appropriate position in a static environment background. {Figure 1 and Figure 2}

Patent Information

Application ID202441087158
Invention FieldELECTRONICS
Date of Application12/11/2024
Publication Number47/2024

Inventors

NameAddressCountryNationality
Mohammed Zafar Ali KhanEE 209, Electrical Engg, IIT Hyderabad Road, Near NH-65, Sangareddy, Kandi, Hyderabad, Telangana-502284, IndiaIndiaIndia
A P VaideeswaranFlat # 101, Srinivasa Residency, No. 2; 8th Cross, Kodihalli, Bangalore, Karnataka-560008, IndiaIndiaIndia
Komaragiri Sai PranavHouse No.: 68, Lane 5, Thirusankalp Villas, Near Vajra Pushpak Apartments, Eshwar Villas Road, Nizampet, Hyderabad, Telangana-500090, IndiaIndiaIndia
Rayani Venkat Sai RithvikFlat No.: 204, Block-B, VSR Celestial Towers, Gajularamaram, Hyderabad, Telangana-500055, IndiaIndiaIndia
K R NandakishoreKuttikkattil House, Thrikkalathoor P.O, Mannoor, Ernakulam, Kerala-683541, IndiaIndiaIndia

Applicants

NameAddressCountryNationality
Indian Institute of Technology HyderabadIIT Hyderabad Road, Near NH-65, Sangareddy, Kandi, Telangana – 502284, IndiaIndiaIndia

Specification

Description:A METHOD OF SEMANTIC TRANSMISSION AND RECEPTION OF A VIDEO FRAME FOR STATIC ENVIRONMENTS

TECHNICAL FIELD

[1] The present disclosure relates to a system and an associated method of semantic transmission and reception of a video frame. In particular, the present disclosure relates to a system and an associated method of semantic transmission and reception of a video frame with an optimized wake-up radio protocol for static environments.

BACKGROUND OF THE INVENTION
[2] Background description includes information that may be useful in understanding the present invention.
[3] Surveillance has become a necessity in recent times. Most static environments have a camera-based detection system for protection and emergency response. The main focus of Wireless Sensor Networks (WSNs) deployed in such scenarios would be to reduce the power consumption of end WSN nodes. For most applications, this would mean optimizing the data representation and turning of the end devices when not in use.
[4] Imagine a setup consisting of a CCTV camera with a fixed position overlooking an area of interest. Throughout the day, the general background stays constant. When an alarm is triggered, the region of interest can be identified within this background. Transmitting and storing the entire footage, which includes a constant background, is redundant. The background is updated if there are any disturbances in the static background. Once in a while, the full frame is sent to update the background information. Clearly, transmitting only the region of interest, which is the main information, will be more efficient, and even lower capacity physical layers can be used.
[5] For the static environments, in an ideal scenario, the Station (STA) should be able to differentiate between the surveillance event Wakeup Radio (WuR) Frame, which contains the data regarding objects of interest, and other WuR routines. The STA should be able to initiate further routines pertinent to Static Environment Surveillance (SES), given that a WuR frame is received.
[6] Out of the existing technologies for static environment applications, camera based detection is coming to the forefront since video evidence has substantial information regarding the event of interest. The conventional technology fails to modify the existing frame format to tackle static environment applications specifically and reduce the power consumption of the WSNs responsible for static environment applications by utilizing semantic communication techniques.
[7] Semantic Communication aims at transmitting only the core information and nothing else. For image transmission, the incorporation of semantic communication helps in representing the data more efficiently and increasing the throughput substantially. Traditional solutions include making use of semantic segmentation models to find the semantic map of the image and Generative Adversarial Networks (GANs) to recover the image from the corresponding semantic map. The GAN recovers the image based on a shared database. This may not guarantee the exact reconstruction with specific characteristics of the area of interest and will be heavily biased over the database. Moreover, the discrepancies in the semantic map due to inaccuracies of the semantic segmentation models influence the performance of GANs heavily.
[8] On any given day, the time corresponding to events of interest is only a tiny fraction of the total time. Thus, any sensor network that does not optimize for this fact cannot be the best in energy consumption. The previous Wi-Fi-based IoT standards required amendments for the very same reason. Improving energy efficiency is vital, especially for IoT applications that often involve small-battery devices. In addition to the above issue, providing connectivity to many devices connected to a single Access Point (AP) was challenging. Subsequently, 802.11ah (Wi-Fi HaLow) was developed, which did the following:
[9] 802.11ah introduced Target Wake Time (TWT), which allows the AP to schedule a frame exchange with a station in advance. The station does not need to listen to the channel at other times, enhancing energy efficiency. However, Wi-Fi HaLow operated in the sub 1 GHz range, making it backward incompatible with legacy networks. Moreover, the introduced TWT had a significant drawback regarding Clock Inaccuracies, with the standard allowing drifts up to 100 ppm. Therefore, there might be a drift of 0.36 seconds per hour. The STA waking time can also be reduced further.
[10] The appearance of 802.11ax introduced new channel access methods, thereby reducing contention-based delays and power consumption. It also adapted TWT from 802.11ah to operate at a traditional frequency. The IEEE802.11ba amendment was introduced to remedy the disadvantages of Wi-Fi HaLow, whereby it introduced an auxiliary radio called Wake-Up Radio (WuR). WuR receives unique wake-up frames or synchronization information from the AP and generates an interrupt to turn on the Primary Communication Radio (PCR). The standard includes a legacy preamble in its Physical Protocol Data Unit (PPDU) format.
[11] This ensures backward compatibility with other previous Wi-Fi standards. Every WuR transmission starts with a legacy Wi-Fi preamble transmitted in 20 MHz bands. The preamble contains three fields: Short Training Field (L-STF), which is used for packet detection, and coarse frequency offset. Long Training Field (L-LTF) is used for timing synchronization and fine frequency offset, and Signal field (L-SIG) determines the frame duration. After being decoded by the legacy devices, the preamble contains information about how long the channel will be physically busy for the whole WuR frame duration. The preamble is followed by a BPSK-Mark field, which is added to prevent 802.11n devices from switching to the channel idle state. The rest of the WuR frame (i.e., WuR Sync and WuR Data) can only be received by WuR. To reduce power consumption for the receiver, a very simple on-off keying (OOK) modulation-based receiver with a narrow band of 4 MHz is used.
[12] Thus, there is need of an improved method of semantic transmission and reception of a video frame in a Static Environment Surveillance (SES) which can not only reduce the power consumption of the SES but also reduce the redundancy in the transmitted and received bits.


OBJECTS OF THE INVENTION
[13] Some of the objects of the present disclosure, which at least one embodiment herein satisfy, are listed herein below.
[14] It is an object of the present subject matter to provide a system and method for tackling SES applications specifically by reducing redundancy in the transmitted bits.
[15] It is another object of the present subject matter to provide a system and method for reducing the power consumption of the WSNs responsible for surveillance by utilizing semantic communication and Wakeup Radio techniques.
[16] It is yet another object of the present subject matter to provide a system and method for extracting the region of interest using deep learning modules and transmitting only the region of interest in order to save bits.
[17] It is yet another object of the present subject matter to provide a system and method for modification of the existing frame format of the IEEE802.11ba standard by reducing the number of bits used per packet.
[18] It is yet another object of the present subject matter to provide a system and method for reconstructing the image by blending the Region of Interest (RoI) onto the static background.
[19] These and other objects and advantages will become more apparent when reference is made to the following description and accompanying drawings.

SUMMARY OF THE INVENTION
[20] This summary is provided to introduce concepts related to a method and a system of semantic transmission and reception of a video frame with a wake-up radio protocol for static environments. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
[21] In an aspect of the present disclosure, there is provided a method, and a system of semantic transmission of a video frame captured by a Wireless Sensor Network (WSN) node in a Static Environment Surveillance (SES). The method comprising detecting an object of interest in the video frame and upon detection of the object of interest in the video frame, identifying and extracting a region of interest in the video frame. Further, the method includes generating a wake-up signal detectable by a wake-up receiver at another WSN node, the wake-up signal comprising a wake-up frame of an optimized frame format. After the said generation of the wake-up signal, the method includes converting the extracted region of interest to a bitstream. Finally, the method includes transmitting the bitstream of the extracted region of interest after transmitting the wake-up signal. It is pertinent to note that the optimized wake-up frame format of the disclosed method includes a plurality of optimized frame fields for containing non-redundant and critical data, the plurality of optimized frame fields are a proper subset of a plurality of standard fields defined by a predetermined wireless communication standard.
[22] In another aspect of the present disclosure, one or more fields of the plurality of standard fields, containing redundant and less critical data are excluded in the optimized frame format.
[23] In another aspect of the present disclosure, the plurality of optimized frame fields comprise a message type field identifying a message type of the wake-up frame, a protected field, an address field identifying an address of the wake-up frame, and a frame body field identifying data of the wake-up frame.
[24] In another aspect of the present disclosure, the predetermined wireless communication standard is IEEE802.11ba.
[25] In another aspect of the present disclosure, the extraction of the region of interest is performed by using deep-learning based techniques comprising semantic segmentation or object detection.
[26] In another aspect of the present disclosure, the object of interest includes of humans, pets, vehicles and packages.
[27] In yet another aspect of the present disclosure, the wake-up frame includes information of the object of interest.
[28] In yet another aspect of the present disclosure, the semantic segmentation includes classifying each pixel in the video data, identifying pixels pertinent to the region of interest, and obtaining a maximum and a minimum value of a coordinate of the identified pixels as a bounding box coordinate of the region of interest.
[29] In yet another aspect of the present disclosure, the object detection includes determining objects in the video data, identifying pixels in the objects pertinent to the region of interest, and obtaining a bounding box coordinate of the region of interest based on a coordinate of the identified pixels.
[30] In yet another aspect of the present disclosure, there is provided a method of semantic receiving of a video frame by a Wireless Sensor Network, WSN, node in a Static Environment Surveillance, SES. The method includes receiving a wake-up signal by a wake-up receiver of the WSN node, the wake-up signal comprising a wake-up frame of an optimized frame format indicating a detection of a region of interest and generating an interrupt to turn on a primary receiver of the WSN node. Further, the method includes receiving, by the primary receiver, the bitstream of the extracted region of interest. After receiving the bitstream of the extracted region of interest, the said bitstream is demodulated. Furthermore, the method includes of initializing channel decoding of the demodulated bitstream of the extracted region of interest and parsing the decoded and demodulated bitstream of the extracted region of interest to recover the copped region of interest. Finally, the method includes of reconstructing the video frame by placing the recovered region of interest at an appropriate position in a static environment background.
[31] In yet another aspect of the present disclosure, parsing the bitstream file includes obtaining the coordinates and pixels of the region of interest.
[32] In yet another aspect of the present disclosure, the static environment background is received in advance and is exchanged periodically.
[33] In yet another aspect of the present disclosure, the appropriate position is determined based on a bounding box coordinate of the region of interest.
[34] Various objects, features, aspects, and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

[35] The illustrated embodiments of the subject matter will be understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of devices, systems, and methods that are consistent with the subject matter as claimed herein, wherein:
[36] Figure 1 depicts a flowchart illustrating a method of semantic transmission of a video frame captured by a Wireless Sensor Network (WSN) node in a Static Environment Surveillance, SES, in accordance with an exemplary embodiment of the present disclosure;
[37] Figure 2 illustrates a flowchart for a method of semantic receiving of a video frame by a Wireless Sensor Network, WSN, node in a Static Environment Surveillance, SES, in accordance with an exemplary embodiment of the present disclosure;
[38] Figure 3 illustrates the semantic transmission and receiving of a video frame by WSNs, in accordance with an exemplary embodiment of the present disclosure;
[39] Figure 4 illustrates a standard wake-up frame format used conventionally in accordance with the prior art;
[40] Figure 5 illustrates a modified or optimized wake-up frame format, in accordance with an embodiment of the present disclosure;
[41] Figure 6 illustrates an exemplary system facilitating a method of semantic transmission and receiving of the video frame setup, in accordance with an embodiment of the present disclosure;
[42] Figure 7 illustrates graphical representations of test results by baseline, semantic segmentation and object detection methods, in accordance with an embodiment of the present disclosure; and
[43] Figure 8 illustrates a performance of the modified wake-up frame format against the standard frame format, in accordance with an embodiment of the present disclosure.
[44] The figures depict embodiments of the present subject matter for the purposes of illustration only. A person skilled in the art will easily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the disclosure described herein.

DETAILED DESCRIPTION
[45] The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to clearly communicate the disclosure. However, the amount of detail offered is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present disclosure as defined by the appended claims.
[46] While the embodiments of the disclosure are subject to various modifications and alternative forms, specific embodiment thereof have been shown by way of example in the figures and will be described below. It should be understood, however, that it is not intended to limit the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternative falling within the scope of the disclosure.
[47] The terms "comprises", "comprising", or any other variations thereof used in the disclosure, are intended to cover a non-exclusive inclusion, such that a device, system, assembly that comprises a list of components does not include only those components but may include other components not expressly listed or inherent to such system, or assembly, or device. In other words, one or more elements in a system or device proceeded by "comprises… a" does not, without more constraints, preclude the existence of other elements or additional elements in the system or device.
Exemplary Implementations
[48] Figure 1 depicts a flowchart 100 illustrating a method of semantic transmission of a video frame captured by a Wireless Sensor Network (WSN) node in a Static Environment Surveillance, SES, in accordance with an exemplary embodiment of the present disclosure. The method includes of detecting 102 an object of interest in the video frame and upon detection of the object of interest in the video frame, identifying 104 and extracting a region of interest in the video frame. Further, the method includes generating 106 a wake-up signal detectable by a wake-up receiver at another WSN node, the wake-up signal comprising a wake-up frame of an optimized frame format 116. After the said generation of the wake-up signal, the method includes converting 108 the extracted region of interest to a bitstream. Finally, the method includes transmitting 110 the encoded and modulated bitstream of the extracted region of interest after the wake-up signal. It is pertinent to note that the present disclosure uses an optimized wake-up frame format wherein the optimized wake-up format 116 includes a plurality of optimized frame fields for containing non-redundant and critical data, the plurality of optimized frame fields are a proper subset of a plurality of standard fields defined by a predetermined wireless communication standard.
[49] In another aspect of the present disclosure, one or more fields of the plurality of standard fields, containing redundant and less critical data are excluded in the optimized frame format.
[50] In another aspect of the present disclosure, the plurality of optimized frame fields comprise a message type field identifying a message type of the wake-up frame, a protected field, an address field identifying an address of the wake-up frame, and a frame body field identifying data of the wake-up frame.
[51] In another aspect of the present disclosure, the predetermined wireless communication standard is IEEE802.11ba.
[52] In another aspect of the present disclosure, the extraction of the region of interest is performed by using deep-learning based techniques comprising semantic segmentation or object detection.
[53] In another aspect of the present disclosure, the object of interest includes of humans, pets, vehicles and packages.
[54] In yet another aspect of the present disclosure, the wake-up frame includes information of the object of interest.
[55] In yet another aspect of the present disclosure, the semantic segmentation includes classifying each pixel in the video data, identifying pixels pertinent to the region of interest, and obtaining a maximum and a minimum value of a coordinate of the identified pixels as a bounding box coordinate of the region of interest.
[56] In yet another aspect of the present disclosure, the method includes the step of transmitting the optimized frame format repeatedly rather than the normal methods of transmission, based on the predetermined wireless communication standard.
[57] In yet another aspect of the present disclosure, the wake-up signal is generated based on the predetermined wireless communication standard.
[58] In one or more embodiments, the predetermined wireless communication standard is IEEE802.11ba, however, the wireless communication standard can be any other similar standard without deviating from the scope of the present disclosure.
[59] Figure 2 depicts a flowchart 200 illustrating a method of semantic receiving of a video frame by a Wireless Sensor Network, WSN, node in a Static Environment Surveillance, SES, in accordance with an exemplary embodiment of the present disclosure. The method includes of receiving 202 a wake-up signal by a wake-up receiver of the WSN node, the wake-up signal comprising a wake-up frame of an optimized frame format indicating a detection of a region of interest and generating 204 an interrupt to turn on a primary receiver of the WSN node. Further, the method includes receiving 206, by the primary receiver, the bitstream of the extracted region of interest. After receiving the bitstream of the extracted region of interest, the said bitstream is demodulated 208. Furthermore, the method includes of initializing 210 channel decoding of the demodulated bitstream of the extracted region of interest and parsing 212 the decoded and demodulated bitstream of the extracted region of interest to recover the copped region of interest. Finally, the method includes of reconstructing 214 the video frame by blending the recovered region of interest at an appropriate position in a static environment background.
[60] In yet another aspect of the present disclosure, parsing the bitstream file includes obtaining the coordinates and pixels of the region of interest.
[61] In yet another aspect of the present disclosure, the static environment background is received in advance and is exchanged periodically.
[62] In yet another aspect of the present disclosure, the appropriate position is determined based on a bounding box coordinate of the region of interest.
[63] Figure 3 illustrates the semantic transmission side 300T and receiver side 300R of the video frame by a WSN in accordance with an exemplary embodiment of the present disclosure. In an embodiment of the present disclosure, the WSN at transmission side 300T is a source WSN node. There might be a surveillance device such as a CCTV camera monitoring a specified area. As can be observed from figure 3, a number of blocks (302-326) are indicated based on the transmission and receiving steps of the present disclosure. At the outset, the object of interest is detected 102 by the WSN node, as depicted by the block 302. The method 100 thereafter checks if there is some disturbance in the background and that no class of interest is present in the background. If the said condition is false, the extracted region of interest 106, as depicted by the block 304, is obtained from the video frame. If the condition is true, the method 100 proceeds to the block 306. The object of interest may include of humans, pets, vehicles and packages etc. The extracted region of interest can be obtained by two methods, namely a semantic segmentation method and an object detection method.
[64] The semantic segmentation method is a visual task that is used to classify each pixel in an image. Fastseg is a semantic segmentation library available in Python that is used to obtain the semantic map. The region of interest is obtained by taking only the pixel values with labels pertinent to the class of interest. In the semantic segmentation method, the maximum and minimum values of the x and y coordinates of the pixels in the region of interest are used to obtain the bounding box coordinates. Further, the object detection method is a visual task that detects certain objects within an image and gives the coordinates to a bounding box of that class. Here, a lightweight implementation of YoloV3 is used. The extracted image of the region of interest can be obtained using the bounding box coordinates from either of the method.
[65] After identifying and extracting the region of interest 106, the wake-up signal is generated 108, as depicted by the block 312. The wake-up signal is detectable by the wake-up receiver at some another WSN node. In an embodiment of the present disclosure, the another WSN node at receiver side 300R is a destination node. Further, the wake-up signal includes of the wake-up frame of an optimized frame format 116. The optimization of the standard frame format 116 as used in the prior arts is delineated in figure 4 and figure 5. The transmission side 300T and the receiver side 300R correspond to methods 100 and 200, respectively.
[66] Figure 4 illustrates a standard wake-up frame format 402 used conventionally. The IEEE802.11ba type field is a 3-bit field indicating the application of the WuR Frame.
Type Type Description
0 WuR Beacon
1 WuR Wake-Up
2 WuR Vendor Specific
3 WuR Discovery
4 WuR Short Wake-Up
5 - 7 Reserved
[67] The values from 5 - 7 are reserved for private applications. The present disclosure uses the value 6 to indicate that the WuR Frame contains semantic object detection data. The address field shall always contain the Transmitter ID or a group ID in the SES applications. If the value of type is 6, every node that receives it can initiate further processes for Static Environment applications. Any value other than 6 will lead to the respective interrupt service routines. As described earlier, in one or more embodiments, the wireless communication standard is IEEE802.11ba, however, the wireless communication standard can be any other similar standard without deviating from the scope of the present disclosure.
[68] ID Field Details:
ID Field Identifier Description
Transmitter ID Broadcast for non-AP-STAs associated with the Transmitter
Non-Transmitter ID Broadcast for non-AP-STAs associated with the corresponding ID
WuR Group ID Identifies a group of one or more WuR non-AP STAs and is selected from a WuR group ID space
WuR ID Identity of the particular STA intended to receive the WuR frame
OUI1 12 MSBs of the OUI in vendor-specific frame

SES Event Frame - IEEE802.11ba Standard

[69] As can be observed from figure 4, the event frame 402 in the prior arts is composed of the following: (1) The Type field includes 3 bits and will be set to 6 (110) to signify SES Applications. (2) 1 bit is used for Protected. (3) 1 bit is used for Frame Body Present, which will be set to 1. All SES event-based wake-up frames will have 1-byte data. Length/Miscellaneous is absent since the Frame Body Present is always set to 1. 12 bits are used for the Address Field and always contain the Transmitter ID or a group ID (ID Field). Type-dependent control is identical to WuR wake-up frame format (ref type dependent field in IEEE802.11ba standard). The Frame Body consists of 8 bits of semantic-based object information. Field Check Sum uses 16 bits.
[70] Figure 5 illustrates a modified wake-up frame format 116 in accordance with an embodiment of the present disclosure. The frame format 116 is modified keeping SES Applications in mind. Therefore, only some of the bits that are being sent initially become redundant. The Type field and Protected are identical to the standard, whereby the Type Field is allocated 3 bits and set to 6, and Protected is allocated 1 bit. Since the Frame Body is always present, the Frame Body is optimized by not including Frame Body Present as a part of the Modified Event Frame. Just like the Standard, Length/Miscellaneous will not be present. Address Field is present and takes 12 bits. The Address Field serves the same purpose as when it was a part of the Standard. Type-dependent control has been excluded as we consider a short Wake-up format. Frame Body has been allocated 8 bits and contains semantic-based object information. FCS, in this case, is not included. If the type value is set to 6, the radio has to be woken up. Even if the type bit is erroneous, the SES events are to be triggered. In this case, false positives have a lesser penalty than false negatives, as in the latter case, there are chances of missing the captured necessary information. In such a case, there will be a length mismatch of the packet. The receiving node can identify this, send a negative acknowledgment frame, and return to sleep mode. The Tx node will then resend the WuR frame. Negative acknowledgments can also be sent if the correction data bit is mismatched.
[71] Coming back again to Figure 3, the obtained extract of the region of interest is then converted 110 to the bitstream. The same has been depicted through the block 306. The bitstream is then channel encoded and modulated, as depicted by the block 308 and 310. After the bitstream is encoded and modulated, the bitstream is transmitted 114. The transmission for the bitstream can be either along or after the transmission of the wake-up signal. At the receiver's side, at the outset, a wake-up signal is received 202 by the wake-up receiver as depicted by the block 316. The wake-up signal includes of the wake-up frame of the optimized frame format 116 as discussed in figure 5. After the wake-up signal is received 202, an interrupt is generated 204 to turn on the primary receiver of the WSN node.
[72] The primary receiver thereafter receives 206 the bitstream of the extracted region of interest. The bitstream is then demodulated 208 as shown in the block 318 and the channel decoding of the demodulated bitstream is then initialized 210, as depicted by the block 320. The extracted region of interest is then recovered by parsing 212 the decoded and demodulated bitstream of the extracted region of interest. The same has been depicted by the block 322. Parsing the decoded and demodulated bitstream of the extracted region of interest may comprise of obtaining the coordinates and pixels of the region of interest. Finally, the video frame is reconstructed 214 by placing the recovered region of interest at an appropriate position in the static environment background. The same has been illustrated by the blocks 324 and 326.
[73] In order to properly reconstruct the region of interest, the extracted image has to be put into an original form with the background. There is a need to send the coordinates of the bounding block as well as the width and height. A basic assumption taken is that the surveillance footage has a resolution of 512 x 512. This will ensure that we will need only 9 bits to represent the coordinates, width, and height of the bounding boxes. In the region of interest extract, each pixel requires 8 bits. The parsing scheme followed is that the first 36 bits represent the coordinates, width, and height in the fashion [x, y, w, h]. The rest of the bits represent the flattened region of interest extract. This is then channel encoded using polar codes and transmitted.
[74] Once the bitstream is received, the coordinates, width, and height are obtained from the first 36 bits. The rest of the bits are also converted back to the 'uint8' format to obtain each pixel. Thus, a flattened vector is obtained, which is then resized to the tuple (height, width) to reconstruct the region of interest extract. The reconstructed region of interest extract is blended at the coordinates obtained so as to reconstruct the original image intended to be transmitted. This results in an almost similar image compared to the original image.
[75] It is to be noted that, in this specific application, the background remains constant and, hence, is the shared knowledge between the transmitter and receiver. The image of the static background is transmitted periodically, however, not continuously. The extract of the region of interest is then blended at the exact location in the background, hence recreating the original image. The data payload contains 8 bits, indicating detection information. The first four bits indicate the presence or absence of the objects mentioned above in that order (bit 1 indicates the presence of that particular object). The next three bits are reserved and can be used for more classes. The last bit will be the OR of the first four bits (provided error correction capacity).
[76] There could be cases where the model fails to detect the region of interest, which enforces the need to have a fallback option. In such cases, the whole image is sent. The wake-up frame is then triggered, given that a foreign object enters the sensor's vicinity. At the sensing end, it is imperative to identify familiar objects to prevent raising false flags. The payload of the frame contains information about the objects of interest. This facilitates future action based on the object that has triggered the sensor.
[77] Figure 6 illustrates an exemplary system 600 facilitating a method of semantic transmission and receiving of the video frame setup in accordance with an embodiment of the present disclosure.
[78] The system 600 includes, but not limited to, processors, memory elements, one or more sensors and the processing devices. In an aspect, the processor(s) may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that manipulate data based on operational instructions. Among other capabilities, the one or more processor(s) are configured to fetch and execute computer-readable instructions stored in the memory of the system 600.
[79] Further, in an aspect, the memory may store one or more computer-readable instructions or routines, which may be fetched and executed to create or share data units over a network service. The memory may include any non-transitory storage device including, for example, volatile memory such as RAM, or non-volatile memory such as EPROM, flash memory, and the like. Furthermore, in another aspect of the present invention, the system 600 includes of a transmission module 602 to perform the transmission 100 functionalities as denoted by Figure 1 and the transmission side 300T functionalities as denoted by Figure 3. The system may further comprise of a receiving module 604 to perform the reception 200 functionalities as delineated by Figure 2 and the receiver side 300R functionalities as delineated by Figure 3.
[80] In an aspect, the processing devices(s) may be implemented as a combination of hardware and programming device(s) (for example, programmable instructions) to implement one or more functionalities of the processing device(s). In an embodiment of the present disclosure, the processing device may comprise of a processing module 606. In examples described herein, such combinations of hardware and programming may be implemented in several different ways. In one example, the programming for the processing device(s) may be processor executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processing device(s) may include a processing resource (for example, one or more processors), to execute such instructions. In other examples, the processing devices(s) may be implemented by electronic circuitry.
[81] The system 600 and its constituent modules 602, 604 and 606 may employ Machine learning and Artificial intelligence based techniques for their respective functionality. Machine Learning may be appropriated to build a model based at least on sample data, known as "training data," in order to make predictions or decisions without being explicitly programmed to do so.
[82] Each of the modules 602, 604 and 606 may implement a particular machine learning model for their respective functionalities. In other example, each of the plurality of modules 602, 604 and 606 may include an ensemble of one or more machine learning models (e.g., Multilayer Perceptron, Support Vector Machines, Bayesian learning, K-Nearest Neighbor) to process input data and based thereupon draw prediction for respective module. The input data for any module 602, 604 and 606 in turn corresponds to either current data received in real-time from data sources such as a local or remote database as corresponding to the data feed database. The input data for each module 602, 604 and 606 may also be processed data received from other module 602, 604 and 606.

Experimental Results And Advantages
[83] Figure 7 illustrates graphical representations of test results by baseline, semantic segmentation object detection methods, in accordance with an embodiment of the present disclosure. As can be observed from figure 7, the metrics used for comparing the resultant reconstruction and the ground truth images are the Normalized Mean Squared Loss (NMSE), Peak Signal-to-Noise Ratio (PSNR), the Structural Similarity index (SSIM), and the percentage of bits saved.
[84] The average statistics for different extracting methods are listed below.
Method to get RoI NMSE PSNR MSSIM bits saved (%)
Semantic Segmentation 3.912 36.981 0.784 54.415
Object Detection 5.412 18.377 0.632 86.718
[85] Based on the above table, using the image comparison metrics, it can be observed that the semantic segmentation method outperformed the object detection method. However, it can also be observe that the percentage of bits saved is much higher for the Object Detection Method than for the Semantic Segmentation Method. This is due to the following issue, which was resolved by the fallback option. Using the Semantic Segmentation Method, in many images within the dataset, it fails to identify the region of interest and hence sends the full image itself, boosting the reconstruction metric results, whereas, for the Object Detection Method, we get up to 95% average reduction in the case when a single person is present, over all the backgrounds in the dataset. These results show that the Object Detection Method is more effective and suited for the task at hand.
[86] Figure 8 illustrates a performance of the modified wake-up frame format against the standard frame format in accordance with an embodiment of the present disclosure. The Transmitter's average power consumption if the IEEE802.11ba frame format is used is around 665.27 μW per transmission. However, if the customized frame format is employed, around 614.64 μW is used, implying a saving of around 50 μW per transmission.
[87] Advantages-
● The disclosed method of extracting the region of interest reduces, on an average, 87% and 53% bits using the object detection and segmentation models, respectively, in comparison to transmitting the full image given the particular dataset.
● The modification of the existing frame format of the IEEE802.11ba standard reduces the number of bits used per packet from 53 to 24 bits (reduction 54.7%). Simulations point to a power reduction of nearly 50 μW/ transmission.
● On simulating the performance of the customized SES frame format against the Standard frame format, we observe that the former uses 54.7% lesser number of bits.
[88] It should be noted that the description and figures merely illustrate the principles of the present subject matter. It should be appreciated by those skilled in the art that conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present subject matter. It should also be appreciated by those skilled in the art that by devising various systems that, although not explicitly described or shown herein, embody the principles of the present subject matter and are included within its spirit and scope. Furthermore, all examples recited herein are principally intended expressly to be for pedagogical purposes to aid the reader in understanding the principles of the present subject matter and the concepts contributed by the inventor(s) to furthering the art and are to be construed as being without limitation to such specifically recited examples and conditions. The novel features which are believed to be characteristic of the present subject matter, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures.
[89] Although embodiments for the present subject matter have been described in language specific to package features, it is to be understood that the present subject matter is not necessarily limited to the specific features described. Rather, the specific features and methods are disclosed as embodiments for the present subject matter. Numerous modifications and adaptations of the system/device of the present invention will be apparent to those skilled in the art, and thus it is intended by the appended claims to cover all such modifications and adaptations which fall within the scope of the present subject matter.
[90] It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as "open" terms (e.g., the term "including" should be interpreted as "including but not limited to," the term "having" should be interpreted as "having at least," the term "includes" should be interpreted as "includes but is not limited to," etc.). It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim recitation to inventions containing only one such recitation, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an" (e.g., "a" and/or "an" should typically be interpreted to mean "at least one" or "one or more"); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should typically be interpreted to mean at least the recited number (e.g., the bare recitation of "two recitations," without other modifiers, typically means at least two recitations, or two or more recitations). Furthermore, in those instances where a convention analogous to "at least one of A, B, and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, and C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). In those instances, where a convention analogous to "at least one of A, B, or C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, or C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase "A or B" will be understood to include the possibilities of "A" or "B" or "A and B."
[91] It will be further appreciated that functions or structures of a plurality of components or steps may be combined into a single component or step, or the functions or structures of one-step or component may be split among plural steps or components. The present invention contemplates all of these combinations. Unless stated otherwise, dimensions and geometries of the various structures depicted herein are not intended to be restrictive of the invention, and other dimensions or geometries are possible. In addition, while a feature of the present invention may have been described in the context of only one of the illustrated embodiments, such feature may be combined with one or more other features of other embodiments, for any given application. It will also be appreciated from the above that the fabrication of the unique structures herein and the operation thereof also constitute methods in accordance with the present invention. The present invention also encompasses intermediate and end products resulting from the practice of the methods herein. The use of "comprising" or "including" also contemplates embodiments that "consist essentially of" or "consist of" the recited feature. , Claims:
I/We claim-

1. A method (100) of semantic transmission of a video frame captured by a Wireless Sensor Network, WSN, node in a Static Environment Surveillance, SES, the method (100) comprising:
detecting (102) an object of interest in the video frame;
upon (104) detection of the object of interest in the video frame,
identifying and extracting a region of interest in the video frame;
generating (106) a wake-up signal detectable by a wake-up receiver at another WSN node, the wake-up signal comprising a wake-up frame of an optimized frame format;
converting (108) the extracted region of interest to a bitstream; and
transmitting (110) the bitstream of the extracted region of interest after transmitting the wake-up signal;
wherein the optimized frame format comprises a plurality of optimized frame fields for containing non-redundant and critical data, the plurality of optimized frame fields are a proper subset of a plurality of standard fields defined by a predetermined wireless communication standard.
2. The method (100) as claimed in claim 1, wherein one or more fields of the plurality of standard fields, containing redundant and less critical data are excluded.
3. The method (100) as claimed in claim 1, wherein the plurality of optimized frame fields comprise a message type field identifying a message type of the wake-up frame, a protected field, an address field identifying an address of the wake-up frame, and a frame body field identifying data of the wake-up frame.
4. The method (100) as claimed in claim 1, wherein the wireless communication standard is IEEE802.11ba.
5. The method (100) as claimed in claim 1, wherein the extraction of the region of interest is performed by using deep-learning-based techniques comprising semantic segmentation or object detection.
6. The method (100) as claimed in claim 1, wherein the object of interest comprises humans, pets, vehicles, and packages.
7. The method (100) as claimed in claim 1, wherein the wake-up frame comprises information on the object of interest.
8. The method (100) as claimed in claim 5, wherein the semantic segmentation includes classifying each pixel in the video data, identifying pixels pertinent to the region of interest, and obtaining a maximum and a minimum value of a coordinate of the identified pixels as a bounding box coordinate of the region of interest.
9. The method (100) as claimed in claim 5, wherein the object detection includes determining objects in the video data, identifying pixels in the objects pertinent to the region of interest, and obtaining a bounding box coordinate of the region of interest based on a coordinate of the identified pixels.
10. A method (200) of semantic receiving of a video frame by a Wireless Sensor Network, WSN, node in a Static Environment Surveillance, SES, the method (200) comprising:
receiving (202) a wake-up signal by a wake-up receiver of the WSN node, the wake-up signal comprising a wake-up frame of an optimized frame format indicating a detection of a region of interest;
generating (204) an interrupt to turn on a primary receiver of the WSN node;
receiving (206), by the primary receiver, the bitstream of the extracted region of interest;
demodulating (208) the bitstream of the extracted region of interest;
initializing (210) channel decoding of the demodulated bitstream of the extracted region of interest;
parsing (212) the decoded and demodulated bitstream of the extracted region of interest to recover the copped region of interest; and
reconstructing (214) the video frame by blending the recovered region of interest at an appropriate position in a static environment background.
11. The method (200) as claimed in claim 10, wherein parsing the bitstream file comprises obtaining the coordinates and pixels of the region of interest.
12. The method (200) as claimed in claim 10, wherein the static environment background is received in advance, and is exchanged periodically.
13. The method (200) as claimed in claim 10, wherein the appropriate position is determined based on a bounding box coordinate of the region of interest.

Documents

NameDate
202441087158-EVIDENCE OF ELIGIBILTY RULE 24C1f [13-11-2024(online)].pdf13/11/2024
202441087158-FORM 18A [13-11-2024(online)].pdf13/11/2024
202441087158-FORM-26 [13-11-2024(online)].pdf13/11/2024
202441087158-FORM-9 [13-11-2024(online)].pdf13/11/2024
202441087158-COMPLETE SPECIFICATION [12-11-2024(online)].pdf12/11/2024
202441087158-DECLARATION OF INVENTORSHIP (FORM 5) [12-11-2024(online)].pdf12/11/2024
202441087158-DRAWINGS [12-11-2024(online)]-1.pdf12/11/2024
202441087158-DRAWINGS [12-11-2024(online)].pdf12/11/2024
202441087158-EDUCATIONAL INSTITUTION(S) [12-11-2024(online)].pdf12/11/2024
202441087158-EVIDENCE FOR REGISTRATION UNDER SSI(FORM-28) [12-11-2024(online)].pdf12/11/2024
202441087158-FIGURE OF ABSTRACT [12-11-2024(online)].pdf12/11/2024
202441087158-FORM 1 [12-11-2024(online)].pdf12/11/2024
202441087158-FORM FOR SMALL ENTITY(FORM-28) [12-11-2024(online)].pdf12/11/2024
202441087158-PROOF OF RIGHT [12-11-2024(online)].pdf12/11/2024

footer-service

By continuing past this page, you agree to our Terms of Service,Cookie PolicyPrivacy Policy  and  Refund Policy  © - Uber9 Business Process Services Private Limited. All rights reserved.

Uber9 Business Process Services Private Limited, CIN - U74900TN2014PTC098414, GSTIN - 33AABCU7650C1ZM, Registered Office Address - F-97, Newry Shreya Apartments Anna Nagar East, Chennai, Tamil Nadu 600102, India.

Please note that we are a facilitating platform enabling access to reliable professionals. We are not a law firm and do not provide legal services ourselves. The information on this website is for the purpose of knowledge only and should not be relied upon as legal advice or opinion.