This weekend was dedicated to learning and experimenting with Gstreamer – an open source library and framework for constructing audio and video processing pipelines. Despite the weekend being spoiled by lots of bad luck (power outages, Internet down, etc.) I managed to beat the hell out of Murphy and get some work done!
My hidden agenda is of course finding a good audio/video library to accompany a software defined radio created using GNU Radio and the Universal Software Radio Peripheral (USRP), and to eventually be able to transmit real time high definition video over the air. While GNU Radio and the USRP can take care of everything related to software radio and RF, I am still looking for a good framework for flexible audio/video processing.
The specific functionality I am looking for:
- Capture video from V4L2 sources, e.g. webcams
- Perform some basic processing such as scaling, cropping, filtering
- Encode using different codecs, e.g. H.264, Theora
- Mux more than one video stream into one container
- Stream using UDP (necessary to interface with GNU Radio)
- Provide a flexible framework that allow experimenting with codecs, formats, etc.
I have already experimented with VLC and ffmpeg with mixed success, see for example A simple way to get video in and out of GNU Radio. While I could get the job done it was not quite adequate and I wasn’t particularly happy. They are great tools for offline processing and transcoding but not quite as flexible as I’d like them to be for real time processing. Therefore I decided that I also need to evaluate gstreamer. For a quick overview of what gstreamer is see What is GStreamer?
First thing I noticed is that there is no specific “Quick Start User Guide”. This is probably because the primary use of gstreamer is as a video processing library to be used by multimedia applications. So the “read this first” document is titled Application Development Manual and it explains everything you need to know about writing applications that use gstreamer. This is great but not quite was I was looking for.
I knew already that gstreamer also has a command line tool that allows building a complete audio/video processing pipeline from the command line without requiring any C or Python code. These pipelines are very similar to a GNU Radio flow graph, which is one of the reasons I am interested in Gstreamer. Fortunately, the Gstreamer FAQ has a section called Using Gstreamer and this lists several examples for building processing pipeline on the command line. Combined with a few third party tutorials such as this, this, this and this, the online API docs, man gst-launch and the gst-inspect tool, I managed to accomplish most of what would be covered by an “Introduction to Gstreamer” training course.
In the following sections I am going to list a few examples for how easy it is to achieve the functionality I am looking for using GStreamer. For more examples you can have a look at my work-in-progress wiki page where I am collecting my GStreamer shortcuts.
The Principle
A gstreamer pipeline is created by connecting various data sources, sinks and processing blocks (bins) in a data flow graph, very much like in GNU Radio. The diagram below shows the pipeline for a simple OGG video player:
The command line for this video player could look something like this:
gst-launch -v filesrc location="videotestsrc.ogg" ! oggdemux name="demux"
demux. ! queue ! theoradec ! xvimagesink
demux. ! vorbisdec ! audioconvert ! pulsesink
Video Test Patterns
One of the cool things I liked already from the beginning is that there is a video test pattern source that can generate a video test stream using any size and format. This is very useful for testing the pipeline without any prerecorded or live video source.
gst-launch videotestsrc ! video/x-raw-rgb, framerate=25/1, width=640, height=360 ! ximagesink
This will generate an RGB test pattern 640×360 pixels, 25 frames per second using the default pattern. It will look like this:
There are 16 predefined patterns that can be selected using the “pattern” option. Some of the patterns are even tunable using additional parameters, see the API docs for videotestsrc.
Webcam Capture
Moving on from the test pattern to the webcam is easy: We replace the videotestsrc with a v4l2src – assuming that the webcam is a video4linux2 input device:
gst-launch v4l2src ! xvimagesink
This will start grabbing frames from the webcam and show them in a window using the highest available resolution, which for my Logitech QuickCam Pro 9000 is 1600×1200. To limit the frame size, pixel format and frame rate, we can insert a “caps filter” after the webcam. Caps filters are used to limit the choices between two blocks. For example, the video test source can probably generate any format and size and the same with the video display. Obviously, gstreamer has to choose one set of format, size, framerate, etc. when creating the pipeline. With a “caps filter” we can limit the choices to a subset or just one.
In the following pipeline we select frame size of 320×240, framerate 20 and pixel format YUY2 (aka. YUYV 4:2:2):
gst-launch v4l2src ! video/x-raw-yuv,format=(fourcc)YUY2,width=320,height=240,framerate=20/1 ! xvimagesink
Let me end the webcam section with a very cool thing! With Gstreamer I can now capture the webcam at any supported size and format. But my Logitech QuickCam Pro 9000 has many UVC controls for adjusting the camera settings, e.g. brightness, contrast, gain, focus. I can access and control these settings using Guvcview even while I am capturing video using gstreamer. All I have to do is execute guvcview with the -o or –control_only command line option, which will enable the image controls in guvcview but let another application capture the video frames:
Encoding and Muxing
Instead of displaying the video on the screen we can save it to a file. We do this by replacing the “xvimagesink” with a proper encoder, a muxer and a file sink:
gst-launch -e videotestsrc ! video/x-raw-yuv, framerate=25/1, width=640, height=360 ! x264enc ! flutsmux !
filesink location=test.ts
This pipeline encodes the video to H.264, puts it into an MPEG-TS container and saves it to a file. The x264enc codec has many options to tune the encoding process but here we used the default ones. To see all the options we have to use “gst-inspect x264enc” because there is no online doc. Unfortunately, the Fluendo MPEG-TS muxer, flutsmux, does not have many options and I found that very disappointing. One of the reasons I am interested in MPEG-TS is that it should be able to provide a muxed stream with constant bitrate, which would greatly simplify the software radio implementation. Obviously, the codec can not provide both a constant bitrate and efficient real-time compression at the same time. The, constant bitrate is achieved by setting an upper limit on the output bitrate of the codec and filling out the “holes” with NULL packets at the container level. This is supported by the MPEG-TS container and is in fact a requirement by digital video broadcasting services such as DVB-T and DVB-S; however, none of the MPEG-TS muxers I have tried, including ffmpeg and VLC, seem to be able to provide a stream with constant bitrate. I have seen a patch for ffmpeg suggesting that the ffmpeg MPEG-TS muxer should support CBR in the very latest or a future release – I’m looking forward to try it soon.
Of course, the greatest advantage of the container is that it allows us to multiplex several audio and video streams into one data stream. The example below multiplexes the video coming from the webcam, a video test pattern and the default audio input channel into one MPEG-TS stream:
gst-launch -e flutsmux name="muxer" ! filesink location=multi.ts
v4l2src ! video/x-raw-yuv, format=(fourcc)YUY2, framerate=25/1, width=640, height=480 ! videorate !
ffmpegcolorspace ! x264enc ! muxer.
videotestsrc ! video/x-raw-yuv, framerate=25/1, width=640, height=480 ! x264enc ! muxer.
pulsesrc ! audioconvert ! lamemp3enc target=1 bitrate=64 cbr=true ! muxer.
Future experiments will include adding other data streams such as telemetry.
Text Overlay
There are already several plugins in GStreamer that can be used to add text on top of the live video. These include fixed text, elapsed time or system date and time.
This will add the text “Hello” on top of the live video:
gst-launch videotestsrc ! video/x-raw-yuv,width=640,height=480,framerate=15/1 ! textoverlay text="Hello" !
ffmpegcolorspace ! ximagesink
The textoverlay plugin has many parameters that can be used for adjusting the text style, size, and positioning. These parameters are also inherited by the clockoverlay and the timeoverlay plugins.
gst-launch -v videotestsrc ! video/x-raw-yuv, framerate=25/1, width=640, height=360 !
timeoverlay halign=left valign=bottom text="Stream time:" shaded-background=true ! xvimagesink
will produce the video shown on the screenshot below:
Conclusion
GStreamer is a really awesome audio and video processing framework with an architecture similar to that of GNU Radio. It is free and open source and works well in Linux. It already comes with many plugins for getting the job done, ranging from simple audio/video capture and playback to large video processing pipelines. The functionality can be easily extended by creating custom plugins and the developer documentation appears to be rather comprehensive. There are already many applications written in C and Python that use GStreamer and this is a great advantage when looking for example code.
There are still many things I have to experiment with before I can be sure that Gstreamer is the way to go for creating a bridge between my webcam and GNU Radio (e.g. picture-in-picture, network streaming), nevertheless, I have a strong feeling that experimenting with gstreamer for this purpose will lead to something useful. The fact that gstreamer is also available on embedded platforms such as the BeagleBoard and other OMAP-based boards, where it can take advantage of hardware DSP is definitely a big plus. After all, we’ll also need a video processing subsystem for our spaceship 😉