This project is maintained by cyin3
Imagine if you could walk around the airport with your suitcase following you at every turn, as you stop, speed up, or even turn –completely hands free. Our project is to construct a program from scratch for a Turtlebot to identify and follow the likeliest red target by controlling the angle and distance from it. Additionally in the works is the turtlebot tracking and following a particular face. Through implenting non-AR-tag vision, our project has to overcome challenges such as accurately identifying its target from constantly changing surrounding and mirroring in real time the target’s movement change. This dynamic robotic shadowing technology, however, can be incoporated into applications that provide convenience to people’s movement.
Robustness: Since we chose to identify a red target, a challenge in the design criteria is accurately identifying the target when there are multiple red objects in the turtlebot’s view. Even with OpenCV’s contour detection and ApproxPolyDP, a distractor red object in the surrounding may interfere with the centroid identification, risking an outlying point to distort the shape of the identified target.
Durability: To make the turtlebot follower more durable, we would need to ensure that the turtlebot behaves stably under different conditions of lighting, room settings, etc. Different settings can make identifying the target less reliable as the target detection may fall out of the range of tolerability.
Efficiency: A big issue we faced was accounting for the image-processing delay, which caused overshoot and significant oscillation. We tried to fix this by adding derivative control and tuning speed and tolerance constraints.
We attached a red target onto the ridgeback to create a moving target. A turtlebot was programmed to follow this target.
Launch follower.launch to start running get_paper_centroid.py. Both get_paper_centroid and get_paper_face_centroid share a similar architecture as seen below:
This is the original image. Original image subscribed from topic: camera/rgb/image_color
. We developed a Python parser for ~100 MB topic output file in order to reconstruct a scene (series of ~25 frames). We could then use Jupyter Notebook in Python with different libraries to test out different algorithms.
We converted the image to HSV image (hue, saturation, value) to be able to filter out all colors except a range of shade of red hues. The white regions are those pixels that have not been filtered out while the white images have been filtered out.
We reinsert original redness into image simply for visualization. Not essential for detection but useful for initial threshold testing.
We applied Gaussian blur to the image. This should smooth out whiteness to create solid block. We then converted it to a black and white with a set light threshold
We used OpenCV’s contour detection to find all the contours in the image and then found the max contour of the blurred/ thresholded image. We reinserted the maximum contour into the original image to test its performance. This contour is effectively a bounding region for the red folder of interest.
We used OpenCV’s ApproxPolyDP which calls an implementation of Douglas Pecker’s algorithm on a particular contour. This algorithm fits a polygon through the contour given a particular error threshold. I knew the red folder would always be a quadrilateral so I implemented a controlled search over the error parameter to find an error threshold to generate a quadrilateral. This algorithm generally gives us the 4 corners of the quadrilateral which we can use find the centroid
As on our reach goals, we decided to extend recognition with existing controls to use face recognition to get the centroid. We used pre-trained OpenCV neural network classifier. This network is traditionally used for face recognition in webcams. This model needs training with our image in particular rather than standard facial features and would be much better if we used one image. This approach has great potential for high accuracy with simple bounding box and has been done many times in the past.
With desired x, y, z coordinates, we control the linear x and angular z velocities of the turtlebot. This allows us to set the angular direction and forward velocities so we can move the bot wherever we want. We set a reference z position for the turtlebot to maintain from the target (0.75m). The turtlebot tries to maintain the centroid of the image in the center of the image it sees, with some tolerance. Both angular and linear velocities were controlled with proportional feedback controllers. The individual proportional gains were tuned for best performance.
Below are the videos of our project in action. The first two videos demonstrate the turtlebot tracking and following the red target on the moving ridgeback. The last video shows the turtlebot performing facial recognition and maintaining its angle and distance from the targeted face.
Our turtlebot is able to track and follow the moving red target, a red folder in this case. In our finished solution, the turtlebot senses from the RGB camera and depth sensor to acquire the position of the red target and the estimated depth of each pixel for that snapshot of its surroundings. Once the x, y, z coordinates of the centroid are calculated, the turtlebot actuates with proportional feedback control.
One of the challenges we faced was providing constant velocity for the for more than 0.6 seconds. Although the turtlebot requires this to move, the velocity of the turtlebot is constantly being updated as it detects changes in the target’s movement and position. We also experimented with zumys as our moving target, but we had difficulties controlling the zumys. For the red folder detection, we also had to minimize the instannces of confusion when the turtlebot sees multiple red objects in its view. Turtlebot’s constantly changing position in space can pose a challenge in reliably detecting one red target. Especially since there exists a delay between the turtlebot’s understanding of the target’s change in movement and the execution of turtlebot’s own movement, the turtlebot may lose sight of the target or have incorrect depth data associated with the centroid point that it identified, as the centroid may now have moved aside to a different point in space. We overcame this problem by taking a snapshot of the depths associated with the image at the point when the turtlebot begins to find the target centroid by processing its surroundings.
Our solution performs well under most circumstances, though a flaw may be the delay between the turtlebot’s image capturing and excution of motion, which may cause the turtlebot to overshoot while turning and subsequent oscillation as the turtlebot tries to correct for the overshoot. If we had more time, we would try to implement a more robust way of predicting the target’s movement in the next time frame to account for the delay on the turtlebot’s part and minimize the risk of overshooting. Additionally if time allowed, we were also interested in programming the turtlebot to identify target containing words (convolutional neural net/ OCR) and follow while avoiding obstacles.
a) Girish Balaji - EECS Major, Interested in computer vision, machine learning, and robotics
Rushil Goradia - ME/EECS Major, Senior, Interested in Controls and state estimation.
Cassie Yin - EECS Major, Junior, Interested in signal processing, circuits, and software development.
b) Describe the major contributions of each team member.
Girish - Designed, researched, and implemented vision processing algorithms used for red folder and face detection; developed data pipeline from turtlebot
Rushil - Designed the ROS framework and topics, and implemented the controllers for linear and angular velocities.
Cassie - Construction of website and video, development and testing of controller with integration of image processing.
code, URDFs, and launch files
data sheets for components used
additional videos and images
Final Project Video (click to below to view on youtube):