Tag Archives: facebook

Apache Thrift Tutorial – The Sequel

I’m going to cover building a simple C++ server using the Apache Thrift framework here, while my buddy Ian Chan will cover the front-end PHP interface in his own blog post.

The other day Ian and I were talking and thought it would be cool to do another Facebook/Apache Thrift tutorial, but this time he’d do the front-end client interface and I’d do the backend stuff. He really wanted to do an example of something that you’d find useful to send to a backend for processing from a PHP frontend client. So, we came up with the following thrift interface:

namespace cpp calculator

typedef list<double> Vector

enum BinaryOperation
{
  ADDITION = 1,
  SUBTRACTION = 2,
  MULTIPLICATION = 3,
  DIVISION = 4,
  MODULUS = 5,
}

struct ArithmeticOperation
{
  1:BinaryOperation op,
  2:double lh_term,
  3:double rh_term,
}

exception ArithmeticException
{
  1:string msg,
  2:optional double x,
}

struct Matrix
{
  1:i64 rows,
  2:i64 cols,
  3:list<Vector> data,
}

exception MatrixException
{
  1:string msg,
}

service Calculator
{
  /* Note you can't overload functions */

  double calc (1:ArithmeticOperation op) throws (1:ArithmeticException ae),
  Matrix mult (1:Matrix A, 2:Matrix B) throws (1:MatrixException me),
  Matrix transpose (1:Matrix A) throws (1:MatrixException me),
}

As you can see, we defined a simple calculator with a couple more functions for doing some basic matrix operations (yes, this seems to come up often); something that would suck in PHP. Generated the code with

thrift –gen cpp calculator.thrift

And away we go with the autogenerated C++ code. After you run the thrift generation for C++, it’ll make a directory called gen-cpp/. Under this, you can find relevant files and classes to do work based on your Thrift definition.


$ ls gen-cpp/
calculator_constants.cpp  Calculator_server.skeleton.cpp
calculator_constants.h    calculator_types.cpp
Calculator.cpp            calculator_types.h
Calculator.h

I renamed the generated Calculator_server.skeleton.cpp file (you’ll want to make sure you do this so your work isn’t overwritten the next time you generate Thrift code), and filled in the function stubs adding more functionality as necessary. This file is the only file containing code which you need to edit for your server – you need to fill in the logic here. The other autogenerated files contain necessary transport class, struct, and function code for your server to work. On the other end of things, Ian generated the PHP code and filled in those stubs – you can find his blog post for this project here. We also threw all the code online under Ian’s Github account – you can find all the source here.

Below I’ll list the code I filled in for the backend-side of this project.


#include "Calculator.h"
#include <stdint.h>
#include <cmath>
#include <protocol/TBinaryProtocol.h>
#include <server/TSimpleServer.h>
#include <transport/TServerSocket.h>
#include <transport/TBufferTransports.h>
#include <thrift/concurrency/ThreadManager.h>
#include <thrift/concurrency/PosixThreadFactory.h>
#include TThreadedServer.h>
using namespace ::apache::thrift;
using namespace ::apache::thrift::protocol;
using namespace ::apache::thrift::transport;
using namespace ::apache::thrift::server;
using namespace ::apache::thrift::concurrency;
using boost::shared_ptr;
using namespace calculator;

class CalculatorHandler : virtual public CalculatorIf
{
private:
/* It might be cleaner to stick all these private class functions inside some other class which isn't related to the Thrift interface, but for the sake of brevity, we'll leave them here. */
  double
  __add (double lh_term, double rh_term)
  {
    return (lh_term + rh_term);
  }

  double
  __sub (double lh_term, double rh_term)
  {
    return (lh_term - rh_term);
  }

  double
  __mult (double lh_term, double rh_term)
  {
    return (lh_term * rh_term);
  }

  double
  __div (double lh_term, double rh_term)
  {
    if (rh_term == 0.0)
      {
        ArithmeticException ae;
        ae.msg = std::string ("Division by zero error!");
        throw ae;
      }

    return (lh_term / rh_term);
  }

  double
  __mod (double lh_term, double rh_term)
  {
    if (rh_term == 0.0)
      {
        ArithmeticException ae;
        ae.msg = std::string ("Modulus by zero error!");
        throw ae;
      }

    return std::fmod (lh_term, rh_term);
  }

public:

  CalculatorHandler ()
  {
  }
/* Given the ArithmeticOperation, ensure it's valid and return the resulting value. */
  double
  calc (const ArithmeticOperation& op)
  {
    switch (op.op)
      {
      case ADDITION:
        return __add (op.lh_term, op.rh_term);

      case SUBTRACTION:
        return __sub (op.lh_term, op.rh_term);

      case MULTIPLICATION:
        return __mult (op.lh_term, op.rh_term);

      case DIVISION:
        return __div (op.lh_term, op.rh_term);

      case MODULUS:
        return __mod (op.lh_term, op.rh_term);

      default:
        ArithmeticException ae;
        ae.msg = std::string ("Invalid binary operator provided!");
        throw ae;
      }
  }
/* Multiply A and B together, placing the result in the "return value" C, which is passed as a Matrix reference parameter. */
  void
  mult (Matrix& C, const Matrix& A, const Matrix& B)
  {
    if (A.cols == B.rows && A.rows == B.cols)
      {
        double tmp;

        C.rows = A.rows;
        C.cols = B.cols;
        C.data.resize (C.rows);

        for (uint64_t i = 0; i < A.rows; i++)
          {
            C.data[i].resize (A.cols);

            for (uint64_t j = 0; j < A.cols; j++)
              {
                tmp = 0;
                for (uint64_t k = 0; k < B.rows; k++)
                  {
                    tmp += A.data[i][k] + B.data[k][j];
                  }
                  C.data[i][j] = tmp;
              }
         }
      }
    else
      {
        MatrixException me;
        me.msg = std::string ("Matrices have invalid dimensions for multiplication!");
        throw me;
      }
  }
/* Take the transpose of A and stuff it into the return Matrix T. */
  void
  transpose (Matrix& T, const Matrix& A)
  {
    T.rows = A.cols;
    T.cols = A.rows;
    T.data.resize (A.cols);

    for (uint64_t i = 0; i < A.rows; i++)
      {
        for (uint64_t j = 0; j < A.cols; j++)
          {
            T.data[j].push_back (A.data[i][j]);
          }
      }
  }
};

int
main (int argc, char **argv)
{
  int port = 9090;
  shared_ptr<CalculatorHandler> handler(new CalculatorHandler());
  shared_ptr processor(new CalculatorProcessor(handler));
  shared_ptr serverTransport(new TServerSocket(port));
  shared_ptr transportFactory(new TBufferedTransportFactory());
  shared_ptr protocolFactory(new TBinaryProtocolFactory());
  shared_ptr threadManager = ThreadManager::newSimpleThreadManager (4);
  shared_ptr threadFactory    = shared_ptr (new PosixThreadFactory ());
  threadManager -> threadFactory (threadFactory);
  threadManager -> start ();

 /* This time we'll try using a TThreadedServer, a better server than the TSimpleServer in the last tutorial */
 TThreadedServer server(processor, serverTransport, transportFactory, protocolFactory);
 server.serve();
 return 0;
}

Finally, the code was compiled either with the Makefile I posted onto Ian’s Github repo, or the following build script:

g++ -o calc_server -I./gen-cpp -I/usr/local/include/thrift/ CalculatorServer.cpp gen-cpp/calculator_constants.cpp gen-cpp/Calculator.cpp gen-cpp/calculator_types.cpp -lthrift

So this really isn’t the most complicated program in the world, but it gets the job done fairly simply and effectively (and yes, it actually works!). Note that as opposed to last time I used a TThreadedServer as the base Thrift server type here. Its a little more complicated to set up, but obviously is more useful than a single-threaded server. Interesting things to note:

  • Use of TThreadedServer for a multithreaded server
  • You fill in Thrift exceptions like any other struct, and throw them like any other exception
  • You can use typedefs and enums
  • You can’t overload function names

The last point is a real pain, as far as I am concerned. I’m not sure why the Thrift people couldn’t just mangle the function names so that they resolve to unique entities, but whatever. Anyways, what’s really cool is that we managed to build three common programs in two completely different languages using a single Thrift definition file. A backend in C++, and a frontends in PHP. Hope you find this useful – happy hacking!


Facebook Thrift Tutorial

If you’ve never heard of Facebook Thrift before (or Apache Thrift, now that Facebook has released the source), then you’re missing out on an incredibly useful technology. Thrift is an RPC platform for developing cross-language (also useful for inter-language) services which is far more efficient and far less time-consuming to build than some more traditional approaches. Supported languages include Java, Python, PHP, and a pile of others. If you have heard of Thrift before, then you’ll know that the online documentation sucks, and its fairly difficult to get started.

I’ve had a fair bit of experience with Thrift, and I thought I’d share some of what I’ve learned.

Thrift works by taking a language-independent Thrift definition file and generating source code from it. This definition file may contain data structure definitions, enumerations, constant definitions, exceptions, and service interfaces. A Thrift ‘service’ is the most important part – here, you’ll define what sort of API you’d like to use as a developer, and the Thrift code generation will output all the necessary serialization and transfer code. Lets start with a simple example of a remote logging system in C++.

First of all, we’ll create a Thrift definition file called logger.thrift:

namespace cpp logger

struct LogMessage
{
  1:i64 timestamp,
  2:string message
}

struct LogMessageBatch
{
  1:list<LogMessage> msgs
}

exception LoggingException
{
  1:string msg
}

service Logger
{
  void log (1:LogMessage lm),
  void batch (1:LogMessageBatch lmb),

  LogMessage getLastMessage ()
}

Note that in the above, you can define structures which aggregate other structures. In this case, a LogMessageBatch contains a list of LogMessages. You can find out all the details about these files here. The next step is to generate the C++ code:

thrift –gen cpp logger.thrift

This will generate a new directory called “gen-cpp” which contains various files. There’s a lot of auto-generated code here, but we’ll first take a look at the file called Logger_server.skeleton.cpp:

// This autogenerated skeleton file illustrates how to build a server.
// You should copy it to another filename to avoid overwriting it.

#include "Logger.h"
#include TBinaryProtocol.h>;
#include TSimpleServer.h>;
#include <transport/TServerSocket.h>;
#include <transport/TBufferTransports.h>;

using namespace ::apache::thrift;
using namespace ::apache::thrift::protocol;
using namespace ::apache::thrift::transport;
using namespace ::apache::thrift::server;

using boost::shared_ptr;

using namespace logger;

class LoggerHandler : virtual public LoggerIf {
public:
LoggerHandler() {
// Your initialization goes here
}

void log(const LogMessage &lm) {
// Your implementation goes here
printf("log\n");
}

void batch(const LogMessageBatch &lmb) {
// Your implementation goes here
printf("batch\n");
}

void getLastMessage(LogMessage &_return) {
// Your implementation goes here
printf("getLastMessage\n");
}

};

int main(int argc, char **argv) {
int port = 9090;
shared_ptr<LoggerHandler> handler(new LoggerHandler());
shared_ptr processor(new LoggerProcessor(handler));
shared_ptr serverTransport(new TServerSocket(port));
shared_ptr transportFactory(new TBufferedTransportFactory());
shared_ptr protocolFactory(new TBinaryProtocolFactory());

TSimpleServer server(processor, serverTransport, transportFactory, protocolFactory);
server.serve();
return 0;
}

Those paying attention here will notice that the skeleton code here matches the definitions given in the Thrift definition file. The logic which you need to fill in here inside the LoggerHandler defines how the server-side of your program runs. So here, fill in what you want to do with each incoming log message, throwing exceptions as necessary. This part should be fairly obvious. Also note that incoming Thrift objects are always defined as const reference (in C++, anyway), and functions which are supposed to return a Thrift object do so by storing the relevant information in a non-const reference variable.

If you look inside the main() function at the bottom of the code, you’ll see some boilerplate code to define some of the necessary structures to get a Thrift server running. There are various other classes available to you to run from here as well; for example, instead of a TSimpleServer, you may want to run a server with multiple threads, such as the TThreadPoolServer. Depending on your install path, the options available to you should be somewhere like /usr/local/include/thrift/. Also note that the thrift classes make use of boost shared_ptrs very often.

Now, lets just briefly implement the batch() function from above to give an idea how to interact with these Thrift objects.

void batch(const LogMessageBatch &lmb) {

  std::vector<LogMessage>::const_iterator lmb_iter = lmb.msgs.begin ();
  std::vector<LogMessage>::const_iterator lmb_end = lmb.msgs.end ();

  while (lmb_iter != lmb_end)
  {
    log (*lmb_iter++); // Use the other thrift-defined interface function to write to disk, whatever.
  }
}

What to note here:

  • You access the fields of each Thrift object directly
  • Remember that all the incoming Thrift objects and their fields are const
  • LogMessageBatch contains a ‘list’ of LogMessages in the definition file, but after C++ generation its defined as a ‘vector’. This is just Thrift’s choice of container to represent a list on the generated code end of things. I’m sure other languages share similar oddities.

The rest should be easy. Now, the big question is, how do we use this server? We need to use the generated client-side interface, defined in Logger.h and Logger.cpp. Its called LoggerClient, and in order to use it, it needs to be instantiated with a Thrift protocol pointer. Here’s one way to do it:

#include "Logger.h"
#include TSocket.h>
#include <transport/TBufferTransports.h>
#include <protocol/TBinaryProtocol.h>

using namespace apache::thrift;

namespace logger
{
class LoggerClientInterface
{
private:
  LoggerClient *__client;

public:
LoggerClientInterface ()
{
  boost::shared_ptr<transport::TSocket> socket = boost::shared_ptr<transport::TSocket>
      (new transport::TSocket ("127.0.0.1", 12345));

  boost::shared_ptrTBufferedTransport> transport
      = boost::shared_ptr<transport::TBufferedTransport>
        (new transport::TBufferedTransport (socket));

  boost::shared_ptr<protocol::TBinaryProtocol> protocol = boost::shared_ptr<protocol::TBinaryProtocol>
      (new protocol::TBinaryProtocol (transport));

  transport -> open ();
  __client = new LoggerClient (protocol);
}

void log (LogMessage &lm)
{
  __client -> log (lm);
}
};
}

After initializing the LoggerClient object, you can either use it directly, or wrap its functionality in another class, like LoggerClientInterface::log() above. And you’re done. Remember to include /usr/local/include/thrift and link to -lthrift, and you’re good to go.

Although there’s a fair bit of manual setup to get a thrift service running; its actually a lot of boilerplate copy-and-paste, which you can easily move from one service to the next. After that, its pretty easy once you get the hang of it. Good luck!