Log rotation

November 30, 2020

Recently I got a problem for log rotation.

We run an open source service on our cloud, and it works well for the functions, but for engineering side, it does not suppport log rotation.

And furthermore, the maintainer think the function is trivial and don't even plan to support this function.

We have at least two solutions for this problem, a) simply change the source code, and add this functionality, and b) use external tools.

The solution a) is straightforward, and we only need to add file size and number check when write logs to file.

But it will involve merge problems when upstream updates, and not even metion the code license problem. So solution a) is my last one.

For solution b), the most common one is to use logrotated service on Linux servers.

Logrotated runs periodically and when file size reaches the size, it renames the file, and sends signal to the process to create new file, and remove old log files.

The very popular nginx use this solution for log rotation.

To use this solution, the service must support signals for log re-create, while my one doesn't.

Until I found multilog, my problem is solved.

Multilog is like tool tee which read from stdout and write to stdout and files, it reads from stdout and write to files with more features such as filer, timestamp adding, etc.

My service doesn't support logrotation, but it can write logs on stdout.

With this I can use the below command to archive the goal of logrotation:

# s<size bytes>: max size for file, 100mb for this case
# n<file number>: keep at least <number> of backups
my_app 2>&1 | multilog s100000000 n10 /var/log/app

Multilog is part of daemontools, which is a nice package, and worth a look.

#log #multilog #daemontools

Last working day

June 16, 2020

June 5th is my last working day in my current company, but not the last day. Because I have worked on many weekends and not taken many annual leaves, so I have more than 20 days of leaves to take.

Event it is the last working day, I am very busy. Not busy for transition but for daily work, making code changes, writing documents, attending technical meetings. But there is not many work for transfer because I have transited to others in daily work.

The only reason is the panic of my leaving and the doubt on other colleague. So they asked me to work until the last minute of the day.

It reminds my last day in another company several years ago.

I was assigned many bugs to fix at the day before my last day, and also was arranged a technical discussion at 10:00 of the last day.

Many people would have already finished the transition and free of work several days of the last day, and even weeks.

I am wondering why I was very busy at my last days.

The one maybe I was so capable of doing my work, and others may take more time than me; And second is the management problem. The teams don't have enough people to work, and they always want to make use everyone for more work, even at the last day.

I feel I've made a very wise decision to leave.

#work #career

Easy way to redirect log in shell

March 28, 2020

If you write a bash script file, and want to keep the running log, what is the easy way?

You can just echo the log messages to stderr or stdout, and add redirection arguments when calling the script.

You can also define one log function and call it to log messages in the whole script, but it is a little complex.

Recently, I learn a easy way to log messages for simple shell script: using exec builtin command.


# !/bin/env bash

# add this at the begnning of script
exec > /tmp/debug.log 2>&1

# other commands may echo to stderr or stdout
if [ ... ]; then
    echo "success"
else
    echo "fail"
fi

Here is the help for exec command:

exec: exec [-cl] [-a name] [command [arguments ...]] [redirection ...]
    Replace the shell with the given command.

    Execute COMMAND, replacing this shell with the specified program.
    ARGUMENTS become the arguments to COMMAND.  *If COMMAND is not specified,
    any redirections take effect in the current shell.*

    Options:
      -a name	pass NAME as the zeroth argument to COMMAND
      -c	execute COMMAND with an empty environment
      -l	place a dash in the zeroth argument to COMMAND

    If the command cannot be executed, a non-interactive shell exits, unless
    the shell option `execfail' is set.

    Exit Status:
    Returns success unless COMMAND is not found or a redirection error occurs.

Check this message: If COMMAND is not specified, any redirections take effect in the current shell.

So it is very useful for simple script just run several commands for workflow, and especially very convenient to embed script in config files.

#Shell #Bash #Linux

Redis Performance Enhancement

March 4, 2020

Recently we find our Redis service is under heavy load, very high CPU(80% – 90%), and QPS(100k counts/second).

Altough the number of clients and users are increasing but the pace is not as high as the load.

After check the slow log and monitor graph, we find several causes:

Usage of bad performance commands in production environment, such as KEYS, ZRANGE, SMEMBERS
Complicated tasks in Lua script
High QPS for EXISTS command

For the first cause, we replace those commands with SCAN like commands, such as SCAN, ZSCAN, and SSCAN.

As for Lua script, the only reason we use it is to keep commands in one transaction. But if the logic is very complicated and even involves the bad performance commands, it slows the performance badly. The solution is to separate the big script to more small scripts or even commands.

And for EXISTS command, we use is as ID check in many cases. So we solve it by keeping a copy of these IDs in memory and syncing with Redis periodically.

After review of the causes of bad performance, all causes are very low level. The reason we don't avoid them is that we pay few attention on performance other than business logic. But with the increase of clients, the problem will be more and more worse. So better solve at the beginning, design phase, not with a hurry only for work done.

#Redis #Database #NoSQL

Copy related commands in Redis

February 19, 2020

Recently we have a case for copy keys stored in Redis, so I checked the commands supported by Redis.

The first is RENAME, not exactly the copy function, but is a simple way to rename the old key to new.

And here are the DUMP and RESTORE commands. DUMP serializes the value stored at key, while RESTORE deserializes the content, and creates the associated values.

A simple example is like this:

127.0.0.1:6379> FLUSHDB
OK
127.0.0.1:6379> HSET mykey foo f1 bar b1
(integer) 2
127.0.0.1:6379> DUMP mykey
"\r\x1d\x1d\x00\x00\x00\x18\x00\x00\x00\x04\x00\x00\x03foo\x05\x02f1\x04\x03bar\x05\x02b1\xff\t\x00#\xcf\xc4\xb5\xed6s\xa0"
127.0.0.1:6379> RESTORE newkey 0 "\r\x1d\x1d\x00\x00\x00\x18\x00\x00\x00\x04\x00\x00\x03foo\x05\x02f1\x04\x03bar\x05\x02b1\xff\t\x00#\xcf\xc4\xb5\xed6s\xa0"
OK
127.0.0.1:6379> HGETALL newkey
1) "foo"
2) "f1"
3) "bar"
4) "b1"
127.0.0.1:6379> HGETALL mykey
1) "foo"
2) "f1"
3) "bar"
4) "b1"
127.0.0.1:6379>

RESTORE also support several options for TTL and replacing the existed key.

There is also a command called MIGRATE which transfers one or more keys from one Redis instance to another. It simply executes a DUMP and DEL command in the source instance and a RESTORE in the target instance.

Backend Server Grouping

February 6, 2020

Here is a case for a REST server clusters.

It provides interfaces both for clients and other backend servers.

It supports both keep-alive and close type of HTTP connection.

At the front of these REST servers, two SLBs servers the load balancing, one with public IP for clients, while the other with internal IP for backend servers.

The client side only need low QPS while backend server side has huge QPS requirement.

The simple solution is to add all the REST servers to the two SLBs.

But I am wondering if I divide the REST servers to two groups, with each group only servers one SLB, does this solution have some advantage over the first one.

The best advantage is that client side is more important, and will not be impacted by other backend servers.

But different type of QPS may cause lightweight load in client group servers while heavy load in backend side.

And the usitilization rate of REST servers is low in client side compare with the first solution.

Which one is better? Is it necessary to group servers?

Facebook folly lib

January 31, 2020

POCO Libraries
Proxygen
folly

In one of recent project, we need to build a C++ HTTP Server with several REST logic. Why we use C++ other than other languages or frameworks? Simple reason is that it is the best language I master and is the fast way to build a system like this.

So for C++, which library or framework should I use to build this system? There is boost library, but it is too heavy I think. So I want to find a light one, and that is POCO C++ LIBRARIES.

POCO Libraries

Here is a overview diagram for POCO:

It has basic libraries for STL like, and also some libraries like XML/JSON, and Net, Databse, etc.

For my own perpose, POCO has one HTTPServer class, which is very easy to setup one HTTP server.

// first, you declare one class which based on HTTPRequestHandler
// and implement the handleRequest function
class SimpleRequestHandler: public HTTPRequestHandler
{
public:
void handleRequest(HTTPServerRequest& request, HTTPServerResponse& response) {
  // implement code
}
};

// second create one factory class to create handlers
class SimpleRequestHandlerFactory: public HTTPRequestHandlerFactory
{
public:
HTTPRequestHandler* createRequestHandler(const HTTPServerRequest& request) {
if (request.getURI() == "/hello")
return new SimpleRequestHandler;
}

// and then for create the HTTP Server
auto ip = config().getString("listen.ip", "0.0.0.0");
auto port = (unsigned short)config().getInt("listen.port", 9980);
PocoSocketAddress addr(ip, port);

int maxQueued = config().getInt("maxQueued", 1000);
int maxThreads = config().getInt("maxThreads", 30);
ThreadPool::defaultPool().addCapacity(maxThreads);

HTTPServerParams* pParams = new HTTPServerParams;
pParams->setKeepAlive(true);
pParams->setKeepAliveTimeout(Timespan(1 * Timespan::MINUTES + 15 * Timespan::SECONDS));  // 75s
pParams->setMaxQueued(maxQueued);
pParams->setMaxThreads(maxThreads);
PocoServerSocket svs(addr);  // set-up a server socket
PocoHTTPServer srv(new SimpleRequestHandlerFactory(), svs,
                   pParams);  // set-up a HTTPServer instance
srv.start();                  // start the HTTPServer
waitForTerminationRequest();  // wait for CTRL-C or kill
srv.stop();                   // Stop the HTTPServer

The above code works well with only one problem, which is that the server is thread pool based, which means is not very high performance.

POCO has SocketReactor class is like this but that is only for TCP, not the one you can use directly.

So, then we found Proxygen.

Proxygen

Simple introduction for proxygen is that it is HTTP Server libraries from facebook.

The first reason for use proxygen is that the HTTP Server class has the very like interface with POCO, like this:

auto port = (unsigned short)conf.getInt("listen.port", 9980);
auto ip = config().getString("listen.ip", "0.0.0.0");
auto threads = config().getInt("thrnum", 0);
if (threads <= 0) {
  threads = sysconf(_SC_NPROCESSORS_ONLN);
}
std::vector<proxygen::HTTPServer::IPConfig> IPs = {
    {FollySocketAddress(ip, port, true), Protocol::HTTP},
};

HTTPServerOptions options;
options.threads = static_cast<size_t>(threads);
options.idleTimeout = std::chrono::milliseconds(60000);
options.shutdownOn = {SIGINT, SIGTERM};
options.enableContentCompression = false;
options.handlerFactories = RequestHandlerChain().addThen<NewHandlerFactory>().build();
options.h2cEnabled = false;
proxygen::HTTPServer server(std::move(options));
server.bind(IPs);
server.start();  // block call

And the second is that, it metions websocket support introduction article.

While, at this time, it seems that chosen for proxygen is not a proper good idea.

Because it seems that they dropped the support for this or only use internally, and was not open sourced. There are several pull requests on the github page, may years ago and were not accepted. It seems that facebook has not maintain this project for a long time.

Besides the no websocket support, there is another problem for proxygen, that is the dependencies. It has a large dependencies, such as libevent, boost(several libraries, not all), double-conversion, gflags, glog, facebook folly, and facebook wangle.

folly

Facebook Folly is an open-source C++ library from facebook, and it contains a variety of core library components, likes POCO.

The feature we use mostly is folly future. It is based on the concept of promise and future in C++11, and extended with non-blocking then continuations.

Since proxygen is an async mode, all the action in http sever is non-blocking. The easy way for async is that you use many callbacks. The direct advantage for folly future is that you can use then expression to avoid callbacks, like this:

Future<Unit> fut3 = std::move(fut2)
  .thenValue([](string str) {
    cout << str << endl;
  })
  .thenTry([](folly::Try<string> strTry) {
    cout << strTry.value() << endl;
  })
  .thenError(folly::tag_t<std::exception>{}, [](std::exception const& e) {
    cerr << e.what() << endl;
  });

Though folly future works well, but there is one big problem which is the stack level. If your application core dumps, and you want to use gdb to find the error, you will see there are many level of frames in the stack, and the debugging procedure is not a happy journey.

#c++ #poco #proxygen #folly #future

Nginx with health check modules

January 27, 2020

We use nginx with a very popular module nginx-rtmp for live stream purpose. When one server is not enough we add a SLB before all these nginx server.

Typically we use Aliyun's SLB instance, and use level 4(TCP) load balancing. But there is one limitation for Aliyun SLB is that the maximum session time is 3600s(1 hour). If you publish one stream via the SLB for over one hour, and the connection disconnects, you retry, you will probably get a new backend server.

For simple RTMP stream it is no problem, but if you want to record all the files, it is impossible to merge all the files together.

So after some search one github, I found two modules which are very usefull to resolve this problems. We replace SLB with nginx servers, using TCP stream with health check, and use consistent algorithm.

Another is also health check module but for HTTP.

Below is simple step for build the module with nginx source.

./configure --prefix=/opt/nginx \
            --with-select_module \
            --with-poll_module \
            --with-http_ssl_module \
            --with-http_realip_module \
            --with-http_xslt_module \
            --with-http_sub_module \
            --with-http_dav_module \
            --with-http_flv_module \
            --with-http_gzip_static_module \
            --with-http_stub_status_module \
            --with-stream \
            --with-debug \
            --add-module=../nginx-rtmp-module-1.2.1 \
            --add-module=../nginx_upstream_check_module \
            --add-module=../ngx_stream_upstream_check_module

Nginx RTMP Module

Stream Upstream Check Module

HTTP Upstream Check

#nginx #slb #upstream