Optical localization of passive UHF RFID tags with integrated LEDs

The ability to accurately localize passive UHF RFID tags in uncontrolled and unstructured environments is limited by multipath propagation. Therefore, in order to increase the spatial resolution of RF based localization methods we propose to combine them with additional sensing capabilities. In this work we enhance passive UHF RFID tags with LEDs, using the Wireless Identification and Sensing Platform (WISP). This allows both humans and computer systems (with cameras) to optically locate tagged items with millimeter accuracy. In order to show the effectiveness of this approach, a PR2 robot is equipped with an EPC Gen2 RFID reader and camera. Using the RFID reader alone, the PR2 is able to identify and coarsely locate tagged items in an unstructured environment. Once the robot has navigated to the vicinity of the LED-enhanced passive RFID tags, it uses the optical location method to precisely locate and autonomously grasp tagged items from a table.


I. INTRODUCTION
RFID technology promises to enable an electronically identifiable world, in which individual objects can be automatically identified and located in unstructured and uncontrolled environments. Reading a conventional RFID tag today provides coarse location information, since a read event indicates that the tag is in range of the reader. For many applications, more precise location information is desirable. For example, a natural extension of the "inventory" application that has driven the design of RFID protocols and hardware is item retrieval. In this application, a human or robot needs to find the location of the item precisely enough to retrieve it, even if there are many other tagged objects nearby. Another variant is location assurance, in which the system must verify that merchandise, safety equipment, or medical equipment are in pre-specified locations.
In an effort to provide more precise tag localization, many researchers have explored more sophisticated RF-based tag localization, by for example, examining signal strength measured between the tag and multiple reader antennae. This paper proposes a more precise, robust, and reliable technique, in which the passive tag is augmented with an LED, and the reader is augmented with a camera that is carefully synchronized to the tag flash. In the subsection below, we review previous tag localization efforts. We believe that the method presented here provides a better combination of precision and robustness to multi-path effects in uncontrolled environments, while still using entirely passive (battery-free) tags.

A. Prior work on RFID tag localization
Initial work on RFID tag localization utilized the digital nature of the tag response to estimate distance. We will use "the binary method" to refer to the implicit tag localization that occurs with every read event. This term is used because every tag is either inside or outside the interrogation field of the RFID reader. The problem with this method is that the spatial resolution is limited to ⇠10 meters and even that coarse information is ambiguous due to reflection and multipath effects. A modification to the binary detection method is to control the output power level of the reader to allow for simple range estimation when the tag is determined to be on the edge of detection zone. It has been shown that using this method along with directional antenna can provide 0.5-1.0 meters of spatial resolution under most circumstance [1]. In a further refinement [2] combined many separate "binary" tag read events to localize a moving, robot-mounted reader with two antennae.
More sophisticated techniques use the RFID reader's ability to measure the RF channel and tag modulation properties in the form of received signal strength and phase information. In the best case scenarios position accuracy can be on the order of several centimeters [3], [4]. However, the success of these techniques depends on how well controlled the RF environment is so that multi-path effects are eliminated or at least well defined. In many real world scenarios the use of anechoic chambers and well-structured portals is not feasible.
Since multi-path phenomena can drastically affect RF-based methods (both magnitude and phase), there is a inherent tradeoff between how well structured the tags environment is, and how accurately the location of a tag can be estimated. One notable exception was demonstrated by Meisen et al, which moves the reader antenna along a known trajectory, while repeatedly measures a static tags [5]. In a sense this method (like [2]) trades accuracy for acquisition time.
Finally, another proposed technique combined a video projector and light detector tag to localize objects. [6] This paper used a large, battery-powered (active) tag. The required projector represents a substantial increase in complexity and power for the mobile system.

B. Proposed work compared to prior work
This paper proposes augmenting passive RFID tags with LEDs, which will allow for both humans and computer systems (with cameras) to optically locate the item. This method shifts some of the technical burden of localization off of the RFID reader and back to the tag. Ultimately, this will reduce the total system complexity and increasing accuracy.
Under this paradigm a mobile RFID reader (either humanborne or robot-mounted) can use conventional RFID localization techniques, such as power modulation and received signal strength, to coarsely locate a tagged item. Once in the general vicinity of the passive tag, its LED can be individually addressed and commanded to flash. Even though this light pulse is brief, both humans and camera systems can reliably see it and the location of the tag can be estimated within a few millimeters.
Although this technique does require line-of-sight from the human/camera to the tag, this layered approach of RF and optical localization combines the best of both worlds. Furthermore, for a majority of usage scenarios that involve individual item identification as well as grasping, manipulating and sorting, line-of-site between the tagged objects and the human/robot is implicit.

II. LED ENHANCED PASSIVE RFID TAG
In order to quickly build and evaluate the performance of an optical localization system based on passive RFID tags enhanced with LEDs the Wireless Identification and Sensing Platform (WISP) was chosen for prototyping. The WISP is a programmable battery-free sensing and computational platform designed to explore sensor-enhanced RFID applications. The WISP uses a 16-bit, ultra-low-power microcontroller to emulate the EPC Gen2 protocol and performs sensing and computation tasks while operating exclusively from harvested RF energy. A full discussion of the WISPs design and performance is presented in [7].
The operation of the LED-enhanced RFID tag is straightforward. In its default mode the tag acts as a standard EPC Gen2 tag. When the RFID reader issues a special command the tag flashes its LED. In this work we used a standard off the shelf RFID reader and triggered the LED by using the EPC "Write" command to write to a specific location in memory.
In order to ensure that the brightness of the LED is not depended on the distance from the reader (i.e. the received instantaneous RF power) the tag stores a small amount of charge on a capacitor and discharges that fixed amount of energy into the LED. Therefore as long as the tag can be powered, it will produce a strong and consistent LED flash. The trade off is that the rate of the flashing is dependent on range. However, this was not an issue for our human or computer vision tests.
Although to date, the incorporation of a LEDs in to RFID ICs is not readily available, recent work in semiconductor optics has shown promising results. The authors in [8], [9] have developed CMOS compatible LEDs operating at 2-3 volts. It is important to note that even with a CMOS compatible process, additional packaging and charge storage issues would have to be overcome. Alternatively, a three-component solution consisting of a custom RFID IC, surface mount LED, and a small surface mount capacitor could easily be implemented today if so desired. One possibility is to extend the functionality of the RFID IC mounting strap, which is widely used in the RFID industry to include the additional components. For instance the strap would consist of an ultra thin PCB for mounting the IC, LED, and capacitor onto. Then the enhanced strap could be bonded to the RFID antenna in the same high volume manner as typically used in the RFID manufacturing process.

III. VISUAL FEEDBACK AIDED LOCALIZATION
One of the most common tasks in RFID enabled asset management and shipping application is the retrieval of tagged items by personnel. In this scenario the combination of both traditional RF based localization and optical localization methods can provide an effective solution. Figure 1 shows a block diagram of the process.
First the tagged item is coarsely located using either the binary (read / no read) method or a combination of RF power control and received signal strength to estimate tag range. These methods are robust enough that mobile, handheld readers can be used to locate passive tags to with in a 1-2 meters in unstructured multi-path environments.
Next the RFID reader issues a command to repeatedly flash the tag's LED. In our implementation, after receiving the write command the WISP goes into a low power state for 20ms in order to harvest additional power and insure that the capacitor is sufficiently charge. From a human's perspective the tag flash is nearly instantaneous after write command is issued.
This visual indicator is very effective at aiding personnel in quickly locating tagged objects. As an example figure 2 shows books tagged with the LED-WISP prototypes on a bookshelf. Once the tag is flashing the book being targeted is clearly identifiable from nearly any location in our lab environment. Finally it is important to note that there is no noticeable difference between the WISP's read range versus the WISP's LED flash range, which is ⇠4 meters.

IV. OVERVIEW OF CAMERA SYNCHRONIZATION AND SYSTEM ARCHITECTURE
Using the LED-WISP, an automated system for locating tags with a RFID reader and camera has been developed. The system is able to query and locate a given LED-WISP by its EPC ID and calculates a direction vector to that tag and confidence score. Once the general vicinity of the RFID tags is identified using standard RF localization techniques, the camera takes two images of each LED RFID tag. One with the LED illuminated and one with the LED off. These images are used to create a difference map, and since the only change between the images is the LED flash, it is easy to identify the pixel location of the target LED. This pixel location corresponds the direction vector emanating from the center of the camera towards the RFID tag. In order to determine range information several techniques have been employed as described in section V, VI, VII. The remainder of this section will focus on the methods and system architecture developed for capturing and computing the individual direction vectors for a population of tags.  Fig. 3. System Architecture. The "EPC Protocol Sniffer" is a WISP whose state mirrors that of the target LED WISP; it triggers the camera capture to ensure synchronization.

A. Camera Synchronization
The basic task of identifying an illuminated LED in a camera image is a fairly straightforward. However, due to the wirelessly power nature of the LED enhanced RFID tags it is difficult to synchronize the brief flashes of light with the camera. For best signal to noise ratio, the camera exposure time window should coincide as much as possible with the LED flash. In the naive approach of simply attempting to read the LED WISP repeatedly, and taking a picture each time, the large majority of images would contain no flash at all. Similarly an un-synchronize video camera is not able to reliably capture the ⇠1ms pulse of light.
To solve the synchronization problem, the RFID communication channel is used to communicate the tag power level to the reader, and a second "packet sniffer WISP," whose state mirrors that of the LED WISP, is used to trigger the camera. Fig. 3 shows the system architecture, including the sniffer WISP. First, the RFID reader issues a Query command to the target tag. If the target tag has sufficient power to flash the LED, it responds to the reader. The reader then issues a Write command. Upon receiving the Write command, the target tag waits for a fixed time delay, then blinks. The packet sniffer WISP tag is wired to the camera's frame trigger. The sniffer WISP listens to the communication channel, waiting for write commands. When it hears a write command, it waits for the same time delay as the LED WISP (minus an offset for the camera's latency), and then triggers a camera capture (by raising an output line that is wired to the camera's trigger input). It then triggers a second frame capture, in which the LED is guaranteed to be off.

B. System Architecture
The system architecture for tag acquisition, camera synchronization, and data processing is depicted in figure 4. The function and interaction of these blocks are as follows: it configures the reader to issue a write command to that tag. It then waits for a response from the camera. If a response is received, it is verified that a write command was issued to the target tag while waiting, and the location is returned. It also keeps track of pings sent out by the camera to monitor that the camera is operating correctly.
• The Smart Camera: The camera is triggered by the sniffer WISP. When triggered, it captures an image. The camera collects images in pairs (one LED on, one off), performs the difference computation on board, and locates the maximum brightness change. It broadcasts its results over UDP. Currently these results consist of the location and magnitude of the brightest pixel from the difference image, and the magnitude of the most negative pixel.
For our system, a NI 1764 smart camera was used for its high resolution, external frame trigger, and on-board processing capability. • The RFID Reader: The reader issues write commands as requested by the host, and provides power for the target tag. It also informs the host when it has performed the requested write commands. • The Sniffer WISP: The sniffer WISP sends a pair of triggers to the camera whenever is observes a write command from the reader. These are timed to allow the camera to capture the blink, and immediately after it. • The Target Tag: The target tag charges itself from the reader. If it has enough power to blink, it will respond to the Query command. Once it receives a Write command, it waits a fixed delay, then blinks.

C. Theoretical Camera Precision
The system provides a very precise direction vector towards the tag. For a tag in a fixed location, the system consistently chooses the same pixel. The average deviation from the mean position was only 0.16 pixels: This means the precision is limited by the camera's resolution and lens distortion, not by errors in measurement consistency. The current system uses a the NI 1764 camera in 1280 by 512 mode. This makes the resolution 1.3mm per meter to the target. The LED WISPs used operated out to a range of about 3.5 meters, making the maximum precision error from the resolution only 2.6mm.
Finally it should be noted that none of the cameras used have been calibrated to correct for the optical distortion of the lens. Although advanced camera calibration is commonly used in computer vision it is beyond the scope of this work, which focused on proving the functionality and utility of LED enhanced passive UHF RFID tags. Thus the results present here represents a lower bounds on location performance.

V. PLANAR 3D RFID TAG LOCALIZATION USING MARKER TAGS
One common application scenario for RFID localization is the automated identification and localization of tagged inventory on shelves. This can take the form of warehouse storage, retail displays, and/or hospital supply rooms where it is important to insure that items are in the correct location so they can be quickly and easily accessed. The advantage of the shelving scenario is that line-of-sight from the RFID reader / camera system is generally implicit, as long as the tags are place on the outside facing surface of the item. Figure 5 shows an image captured during the localization process, which consists of tagged books and boxes on a metal shelf in a lab environment. The LED enhanced WISPs are marked with red arrows. It is important to remember that in these images it is difficult for humans to readily identify the one or two pixels that represent an LED flash. The camera takes two images (the first with the LED on, the second not) approximately 100ms apart and then computes the differencemap to identify the correct pixel.
In this experiment the RFID reader and single camera are placed 2 meters away from the bookshelf. This location represents a reasonable distance and bearing, from the reader to the tags, that can be achieved using RF only localization methods such as RSSI and power modulated distance estimation.
During the localization process the RFID reader inventories the tags and commands the individual WISPs to blink their LEDs. For each LED flash a single camera captures two Image of the reconstructed 3D tag locations. The green square represents the camera location along with the tag position vectors measured by a single camera. The black grid presents the estimated plane of the shelf which is calculated using two known "marker tags". The red dots are the calculated tag positions and the blue circles are the measured location of the tags.
images and calculates the position vectors. This is shown in figure 6, where the camera is represented as a green square and the position vectors are shown pointing towards the tags. At this stage the camera has only computed the 2D location of the tags, represented as angles (theta and phi).
To extrapolate the 3D position of the RFID tags, two marker tags are place on the corners of the shelf with known heights from the ground, and known separation distance. The rationale for the marker tags is that they only need to be installed on the infrastructure once and provide a level of ground truth that helps with tag localization. Furthermore, the marker tags can store the ground truth information in memory so that the RFID system does not have to query a database for the ground truth.
If the camera height and angle from the horizon is known then the distance from the camera to each of the marker tags can easily be calculated. Once the positions of the corners of the shelf are known then the pose and distance of the shelf to the camera can be computed in 3-D space. In figure 6 the plane of the shelf is represented by the black grid.
Assuming that all tag items lie on the plane of the bookshelf, it is straightforward to calculate the intersection of the plane and the position vector for the unknown tags. Figure 6 shows the estimated position of the tags as red dots and the ground truth as blue circles. It should be noted that the actual position of the tags was not necessarily in the plane of the shelf. The ground truth measurements (i.e. blue circles) represent the actual 3D location of the tags.
The results of this approach are quantified in table V. The first sets of columns show the localization error for each tag in millimeters. The experiment was done for two camera positions. One facing the shelf and the second rotated and translated off to the side. It is believed that the predominate sources of error is caused by inaccuracies in estimating the

Tag ID Localization Error in mm [X,Y,Z]
* Tags 410 and 416 are "marker tags" which provide a vertical reference point that aids in calculating the 3D location of the shelf and tags relative to the camera.
distance of the marker tags to the camera. Small errors such as the camera not being level and lens dispersion not being calibrated out cause larger errors in tag location estimation over the ⇠2 meter distance from the shelf to the camera. Furthermore, it is estimated that the accuracy of the ground truth measurements is only +/-1.6 mm. Tags 410 and 416 are the marker tags. Since the height of these tags is known their error in the Y (vertical) dimension is zero and this data is not applicable (N/A) for error analysis. Over all the percent error from the camera position to the location of the tags is between 1-3%.
Although it is believed that the accuracy of this localization scheme can be greatly improved by camera calibration these results show that it is possible to estimate the position of an LED enhanced passive tag to within approximately 10 mm. To show the utility of this approach consider the rows of books on the shelf as shown in figure 5. On the center top of the shelf are side-by-side books that are tagged with LED enhanced WISPs. Figure 6 and table V clearly shows that the relative position of these two books can be determined. This means that in a library situation it is not only possible to identify and coarsely locate books, but with this technique it is also possible to electronically ensure that the books are in the correct order.

VI. FULL 3D TAG LOCALIZATION USING STEREO CAMERAS
There are many classes of RFID localization applications that require full 3D position estimation and thus cannot rely on the planar 3D solution described in the previous section. Examples include the tracking of moving objects, precision navigation based on RFID beacons/markers, and robotic grasping manipulation of objects in the home setting. To address these applications we propose to use externally triggered stereo cameras. Each camera will simultaneously capture the RFID LED flash and compute their respective direction vectors. The intersection of these vectors represents the 3-D location of the tags relative to the camera.
Stereo cameras are widely used in computer vision applications. However, the addition of individually addressable LED enhanced RFID tags creates several unique benefits. To begin with one of the major challenges to implementing effective computer vision systems is the identification of corresponding points in the left and right images that are captured by the Fig. 7. Left and right stereo camera images of tagged objects placed on the table. The LED enhanced passive RFID tags are marked with red dots. stereo cameras. This is a very computationally intensive task that frequently fails. Because conventional stereo techniques often fail to find the required point correspondences, the resulting depth images tend to be noisy and contain many regions with no depth data.
In contrast, since the passive RFID tags presented here can flash their LEDs when commanded, corresponding points can be identified in the two camera images with extremely high reliability. In this scenario, two cameras (left and right) would each use the same synchronization method and pixel map detection technique described earlier, to identify the same LED (i.e. point in space) between the two images.
Another challenge for computer vision systems is the segmentation of individual objects in the images. Although it is possible to find corresponding features from one frame to the next using the techniques such as the Scale-Invariant features transform (SIFT), object identification is still an open research topic. However, with the use of an RFID reader and LED enhanced RFID tags it is possible to simply query the scene and determine that there are six objects in a given region and that those objects are located at coordinates [X,Y,Z]. Figure 7 shows an image of tagged objects on a table. The LED enhanced WISPs are marked with red dots. In this experiment the checkerboard grid underneath the objects is not used for localization but instead is used to help measure the ground truth position of the tagged objects for later comparison to the calculated distances. Once again the RF environment is unstructured, consisting of metal bookshelves and workbenches that create RF reflections. In fact the image shows that one of the tagged objects is a metal fire extinguisher. All of these unstructured metal objects make it difficult to locate the individual tags with millimeter resolution if out of band sensing mechanisms are not used.
At the time of this publication two externally triggered cameras were not available. Therefore, a single NI 1764 smart camera was simply moved to the left and right stereo camera positions in order to record the stereo data. This may result in some error in computing the tag location because the baseline distance between the left and right images may vary slightly when the cameras moved from position to position. The base line for this system (camera separation) is 200mm. Furthermore; the camera was not calibrated for optical distortion in the lens and image plane. Figure 8 shows the reconstructed 3-D locations of the tags In this experiment no marker tags are used, thus each tag ID represents a uniquely tagged object. The experiment was repeated for three camera locations, positioned at 0, 16, and 32 degrees as rotated around the vertical axis of the table. The radius was approximately 1.5-2 meters. The location errors for each tag are shown in table VI. The results show that individual tags can be located within 10-20 mm. The overall percent error from the center of the stereo camera position to the location of the tags is between 1-6%.
These results show that it is possible to locate LED passive RFID tags with greater accuracy then RF only methods previously reported. However, further refinements and im- provements in location accuracy is still possible using more sophisticated cameras and image processing techniques.

VII. APPLICATION: ROBOTIC GRASPING
Robot Grasping in the unstructured human environment has been one of the critical bottleneck during the development of personal robotics. One of the difficult tasks for personal robots is to locate and grasp a particular object in the cluttered environment. RFID-enabled localization techniques such as RSSI map have been proposed to facilitate object searching. However, scan can take a significant time to complete and signal strength fluctuation can cause problem. Therefore, a more efficient and highly precise localization techniques is desired. In this section, we demonstrate the use of our proposed system to localize an object with the passive UHF RFID tag with integrated LEDs to enabled the fast and reliable object searching and grasping in the cluttered environment.

A. System Setup
We integrated our localizing system on the standardized hardware and software of the Willow Garage PR2 robot. Figure 9 shows the NI 1764 smart camera and a RFID antenna are mounted on the PR2's head next to the existing depth sensor (Microsoft Kinect). All host software is implemented as a robot operating system (ROS) node to control the camera, the RFID reader, and the sniffer WISP.

B. Experiment
Our goal was to enable the robot to recognize and locate individual tags object clustered on table at human-like speeds, without servoing for the peak signal in RSSI. The robot will find the target object with the LED enhanced passive RFID tag from a pile of objects on the table, and then grasp it. The steps are described below: 1) The RGB-D type sensor on the PR2's head (Kinect) is used to create a 3-D point cloud of the all the objects on the table. Fig. 10(a) shows the image seen from the Kinect's view. In this example, there are 19 objects on the table. Fig. 10(b) shows the 3-D environment perceived by the robot using Kinect sensor visualized by RVIZ, a 3-D visualization tool in ROS. Although, the point cloud is segmented into different blobs of points, it is difficult for the robot identify unique objects. 2) Next the robot initializes the RFID localization method as described in section IV. When commanded to the LED enhanced WISP flashes its LED and the NI 1764 camera locates the tag and returns a direction vectors to ROS. The blue line in 10(a)(b) indicate the direction vector to the LED WISP. Only the detection with confidence scores over a certain threshold is used. The robot will redo the detection until a valid detection is found. 3) A ROS service node is used to select the desired object cluster, given all the object clusters and the direction vector. It selects the object closest to the direction vector by finding the centroid of each object on the table and compute the distance between the centroids and the direction vector. Due to the different views of the Kinect cameras and the NI smart camera, we calibrate the poses of the two camera frames and transform the 3-D point cloud obtained by the Kinect to the smart camera's frame before all the data processing. 4) After the desired object is correctly selected, the robot plan a feasible grasp for the selected object. Fig. 10(c) show a successful grasp result.

C. Results
In order to examine the accuracy of this object localization method the target object is placed at 20 different positions on the table, which is in the view of both the Kinect and NI 1764 cameras and the RFID reader. During the experiment 19 out of 20 trials resulted in successful object detection. This means that in one trial an error occurred when determining the intersection of the tag direction vector and the point cloud. After object detection was completed the robot moved onto the task of grasping the object, where 17 out of 19 trials where successful. The results show our system enables the robot to quickly and accurately find the desired object by optically localizing the Passive UHF RFID Tags with Integrated LEDs.

VIII. CONCLUSION
This paper addresses the issue of locating passive RFID tags in uncontrolled and unstructured environments by augmenting the tags with an LED that can be flashed when commanded by an RFID reader. This passive (i.e. battery free) solution overcomes the multi-path issue faced by traditional RFID location methods and provides greater position accuracy then previously reported methods.
A prototype of a passive, LED enhanced RFID tag is presented using the WISP platform, and methods for manufacturing a low cost, high volume version are discussed. In its most basic form the LED enhanced tag provides a highly effective method for guiding people to tagged objects that can be individually address with an RFID reader.
More sophisticated methods of computerized tag localization are demonstrated using both, a single camera approach for 3D planer tag estimation and stereo cameras for full 3D tag localization. Both of these methods use an external protocol sniffer to trigger the cameras to capture the brief LED flashes from the RFID tags. These techniques show that the tags can me localized to with in 10-20 mm accuracy.
A final demonstration of the utility of this new capability is shown using the PR2 robot from Willow Garage. In this example tagged objects on a table are individually commanded to blink and there location is identified by the camera system on the robot. Using this information the PR2 robot is then able to efficiently and repeatedly pick up objects from the table.