Demystifying C++ Streams

One of the first differences people will notice between C and C++ uncovers itself with the simple hello world program:

// C
int main()
{
  printf("Hello World!\n");
}

// C++
int main()
{
  std::cout << "Hello, World!" << std::endl;
}

Where C uses printf and fprintf in combination with a complex api for file descriptors to manipulate output streams, C++ streams offer a much more intuitive and unified way to think about input and output. On first inspection this may seem like some fancy C++ language feature that is being employed allowing us to interact with IO data in this way, but the reality is a little more mundane.

The Smoke and Mirrors of Operator Overloading

C++ offers more than C in many ways beyond the addition of classes. One feature that cannot be overlooked in C++ is the ability to perform function and operator overloading. This capability transforms the very restrictive language of C into one of the most expressive languages available today. All types in C++ can offer an overload of almost any operator with very few restrictions. People will often take advantage of this in order to allow their types to interact with the IO stream objects from the language’s standard library:

std::ostream &operator<<(std::ostream &stream, Value val) {
    stream << val.toString();
    return stream
}

In this example we have made a very simple overload of the << operator which means we can now use the user defined Value type in a stream statement:

std::cout << "The value is " << Value(5) << std::endl;

This very common way of using the operators << and >> gives them the rightful name of Stream Operators.

But is this really an accurate name for what these operators do? Certainly when using the C++ standard library that is clearly what they are intended for, but ultimately it is the standard library alone that truly gives these operators their namesake. Consider this alternative operator overloading that achieves the exact same outcome:

std::ostream &operator && (std::ostream &stream, Value val) {
    stream << val.toString();
    return stream
}

std::cout && Value(5) << std::endl;

You might think that this is displeasing to look at (and you certainly would not be wrong) but ultimately this code is no less valid. So it would seem to be the case that these so called stream operators do not so much gain their name from an explicit rule set down by the actual language, but instead through a convention that has been set within the standard library.

C++ Streams as a Social Construct

Once looking at streams in this alternative lens, it becomes much less daunting to wonder how these magical IO streams are working under the hood. Let us now look a little more deeply at our original overload to work out how it makes our new Value object fit so nicely within the standard libraries std::stream type.

std::ostream &operator<<(std::ostream &stream, Value val) {
    stream << val.toString();
    return stream;
}

Ultimately this is just a simple function that we would be perfectly within our rights to call directly without the language’s syntactic sugar:

operator<<(std::cout, Value(5));

This function has a return value which will be a reference to the std::cout object. So taken in the context of our original statement we can now see why the std::endl stream modifier also plays nicely next to our user defined Value type:

std::cout << Value(5) << std::endl;
|                   |
--- std::ostream ----

std::endl isn’t really interacting with the Value object at all, which we would become painfully aware of if we overloaded the operator a little differently (Although in a way that is perfectly legal in the C++ language (It’s no wonder it’s so capable of creating spaghetti code)):

Value operator<<(std::ostream &stream, Value val) {
    stream << val.toString();
    return val;
}

Here the operator would not work at all as intended:

std::cout << Value(5) << std::endl;
|                   |
------- Value -------
// Syntax Error as std::endl attempts to stream into Value type.

So hopefully this illustrates the lack of a special nature around streams and the way they work…

But hang on a minute, stream manipulators, surely there is some fancy magic going on there?

How Stream Manipulators are Manipulating You

Yet again much of the magic of stream manipulators is in fact a very simple trick. If we take the simple one we have already looked at std::endl. This is a very simple manipulator adding a new line to the output stream and then flushing it to ensure that all previous output has actually moved out of the stream buffer. Knowing how the stream operator works, we can quite easily imagine the function that might complete this task:

std::ostream &operator<<(std::ostream &stream, Endl endl) {
    stream << "\n";
    stream.flush();
    return stream;
}

Now obviously that’s not the real implementation, for starters there is no Endl object, and in all likelihood this function is more likely delegating to std::flush, but ultimately that will be performing a similar task, that is, calling some function on the stream object which flushes the data.

But how about more interesting stream manipulators?

std::cout << std::boolalpha << true;

This fancy manipulator will ensure that and subsequent booleans that are piped to the cout stream will appear in its text form (“true”) rather than a numeric value. So an initial thought about how this might work would be to consider it similarly to how IO streams in Java operate. In Java, different types of streams will be combined to take advantage of their differing characteristics:

new ObjectOutputStream(new FileOutputStream("save.dat"));

Here, anything written to the ObjectOutputStream will then be pushed out onto the FileOutputStream and written to disk in a pipeline fashion through each of the stream objects.

So is this what we are seeing here with std::boolalpha? Nope, not even close. As the documentation from cppreference.com explains, std::boolalpha:

Enables the boolalpha flag in the stream str as if by calling str.setf(std::ios_base::boolalpha)

So again this seemingly clever stream manipulator is nothing more than a bit of syntactic sugar using operator overloading entirely provided by the standard library, to flip a status bit on the stream object.

Do you feel conned yet?

This total lack of interesting things that are going on with C++ streams may leave you a little dissatisfied, but I would argue that it points to the expressive power of of the language’s more simple features. Because there is no special connection between the stream operators and IO streams in the standard library, there is nothing stopping us from creating our own stream style library that is just as, if not more intuitive.

Streams for Something Completely Different

A current project of mine relates to the generation of classical music using classical harmony theory and simple algorithms to produce a (hopefully) pleasant listening experience. The process of composition can be looked at as a series of transformations, gradually increasing the complexity of the piece. Starting with a chord progression, we can then produce harmonically consistent lines of music, and then add rhythmic changes to these lines to make them more interesting. The transformation process involved in this kind of program makes this an ideal candidate to be represented using the stream design pattern. We will begin by defining the input and outputs to the stream:

ah::Melody melody;
ah::NoteStream ns;
ns << melody.getSequence();
std::vector<Note> modifiedMelody = ns.getSequence();

This program will simply push the notes of a melody onto the stream without performing any kind of modification. We can then collect the results of our stream using ns.getSequence() to convert it back to a usable vector of notes. Naturally we will want to be doing something a little more interesting than this, so we can start to define some simple stream modifiers.

Note tying stream modifier

Let us suppose that our input ‘melody’ object has produced a melody without any rhythmic information, it uses one note for each beat:

music_fig1

Now as a first very basic change we can tie consecutive notes of the same pitch together, producing:

music_fig2

This manipulator immediately shows why we may want to perform this processing in a streaming fashion. With music being a continuous set of notes, under this manipulator we will not yet know the length of the very last note at any particular time, as a new note at the same pitch may be about to get pushed onto the stream. Because of this, we need the manipulator to in some way hold it’s own buffer, and only push out to the stream once it knows there are no more notes coming at the same pitch.

ns << ah::tying_manip << C << C << E;

Here we would expect an output of a C of two beats in the output buffer, as the following E has ensured that the C cannot be any longer, but also is currently sitting in the tying_manip buffer. But how can we assure that once we’ve pushed into the tying_manip buffer that it actually makes it out again on output? How has tying_manip been linked to output. A look at the types on the expression reveals the answer:

output << ah::tying_manip << C
|                       |    |
--------NoteStream-------   Note

NoteStream has an overload that takes on the manipulator and applies it to the rest of the stream, similar to how the std::boolalpha manipulator works on a standard output stream, however in introducing the second manipulator for NoteStream hopefully I can show an improvement (…bit controversial) on the standard libraries use of simple flags for manipulators.

The problem with flags for streams

Flags generally work fine for most things on the output stream, it does however force us into only allowing manipulators to interact commutatively. This can seriously reduce the range of operations available to us, and also does not make a great deal of sense in the context of a stream object that so clearly indicates order.

To illustrate this point, we can introduce a second manipulator which performs a different operation, adding passing notes. Passing notes are a very simple musical device to make melodies more interesting by putting a note between notes that are a third apart (if that doesn’t make sense don’t worry, just look at the pretty notes below and how they relate to the original melody):

music_fig3

By looking at these two different manipulators together, we can see how the commutative nature forced upon us using flags on the stream will not allow certain processing.

First we will take our original melody:

music_fig1

If we apply the tying manipulator first we will produce:

music_fig2

And then the passing note manipulator:

music_fig4.1

But should we choose to perform this operation the other way round, we would create a slightly different melody.

music_fig4.2

In order to take advantage of the non-commutative nature of these functions, rather than using flags, we can instead implement these manipulators as if they were gates being placed in front of the stream, with each gate having an effect on the next manipulator in the chain:

note_stream_fig5

This can be done by giving the NoteStream class a vector which holds these manipulators in the order in which they were pushed onto the stream

ah::NoteStream defaultStream;

defaultStream << ah::tying_manip << ah::passing_manip;
auto initialized = ah::NoteStream(manipulators: {
  ah::tying_manip,
  ah::passing_manip});

assert(defaultStream == initialized)

Now it becomes a simple process of piping data that gets pushed onto the stream into each of these manipulators in order:

void operator<<(ah::NoteStream &stream, const Note &note)
{
  std::vector<Note> noteBuffer = { note };
  for (auto &manipulator: stream.manipulators)
  {
    noteBuffer = manipulator.process(noteBuffer);
  }
  stream.output.insert(
    stream.output.end(),
    noteBuffer.begin(),
    noteBuffer.end()); // Where output is NoteStream's after processing output stream.
}

To finish off, we must provide another manipulator that is similarly crucial to the standard library’s own streams, a flush command. Because each of the manipulators can have their own internal buffers, in which they can hold onto notes before they are pushed into the next part of the stream, it is important that we provide a way in which we can inform them that we have come to the end of an output section.

void flush(ah::NoteStream &stream)
{
  std::vector<Note> noteBuffer = {};
  for (auto &manipulator: manipulators)
  {
    noteBuffer = manipulator.process(noteBuffer);
    // Where a manipulator's flush function would move any notes from the manipulator's buffer, into the buffer provided
    manipulator.flush(noteBuffer)
  }
}

This procedure should now flush out any final data stored in the manipulator buffers, finishing the output to the note stream.

note_stream_fig6

C++ Openness, A Blessing and a Curse

Many of the arguments made in favour of C++ tend to come down to its speed, direct control and interoperability with C, but I would argue that its openness to define things the way you want is the most fascinating aspect of the language. This can be a tremendous asset in creating intuitive native looking language features for libraries, such as IO Streams. As someone with a keen interest in programming language design, it is a breath of fresh air to be able to write out new language concepts with the safety wheels taken off, but as a developer working in a large C++ code base daily, the no hand-holding approach of C++ can be a source of serious anxiety. Without proper oversight, communication and discipline, C++ code can be a minefield of “elegant” code, unintelligible to anyone beyond the original author. Operator overloading plays just a small role in the set of C++ features that make the language so controversial, and yet so powerful.