Wednesday, August 28, 2019

A C++ Pitfall I was just caught by

It's be a long time that I was using C++ without surprise. I'm not saying my C++ code was totally bug free, but the mistakes were commonly due to some kind of carelessness and can be easily identified, understood and fixed. Until today... So, I think I should share it.

Let's say we have a vector of string:

std::vector<std::string> tokens;

Now I iterate over each element of the vector.

for (int i = 0; i < tokens.size(); i++) {
  ...
}

The index i is useful within the body of the loop. Everything goes as expected.

Then, for some reason, I want to do one more loop than the number of tokens. So I change the code to:

for (int i = -1; i < tokens.size(); i++) {
  ...
}

Is this correct. No. And even worse, it is supposed to generate some signals along with others. It does not crash. And the code was for experiments with no test cases covering it yet. It was found until I happened to carefully check the generated data. And I went back to read the code again and again. The -1 was not a constant in the real code but was generated by an expression. All of a sudden, I realized the problem.

What is the problem then? The unsigned int. For some reason, std::vector::size (and all other sizes in std) is of size_t, which is an unsigned integer. While i is a signed integer. Now, in the condition expression i < tokens.size() in C++, i is first cast into an unsigned int before being compared with the unsigned int. So, -1 becomes 0xFFFFFFFF (64-bits) which fails the condition and the whole loop body never runs.

This nearly wasted several days of my time. I begin to miss the requirement of explicit cast in Go for this situation.

Monday, July 13, 2015

A light and fast type for serializing to a byte slice in Go

Sometimes we need to serialize some data structure and form a byte slice ([]byte). The builtin library provides a type in bytes package named Buffer. While in GolangPlus, a type named ByteSlice can be an better alternative in github.com/golangplus/bytes package.

Actually, ByteSlice is nothing but a renamed []byte, i.e. simply

  type ByteSlice []byte

while a Buffer contains much more fields:

  type Buffer struct {
    buf []byte
    off int
    runeBytes [utf8.UTFMax]byte
    bootstrap [64]byte
    lastRead  readOp
  }

which means much more memory usage. In some situation, this could be a problem.

Preparing a ByteSlice is much lighter than preparing a Buffer. Here is the benchmark for serializing a 10-byte data:

  BenchmarkByteSliceWrite10       20000000 101 ns/op
  BenchmarkBytesBufferWrite10_New  3000000 460 ns/op
  BenchmarkBytesBufferWrite10_Def  3000000 474 ns/op

BenchmarkBytesBufferWrite10_New initializes the buffer with bytes.NewBuffer and a 10-byte-long byte slice and BenchmarkBytesBufferWrite10_Def just defines a zero value bytes.Buffer variable. The more than 4 times advantage of ByteSlice over Buffer is caused by the difference of intializing the object.

Writing to a *ByteSlice is appending to the slice. For example, (*ByteSlice).WriteByte is implemented as follow:

  func (s *ByteSlice) WriteByte(c byte) error {
    *s = append(*s, c)
    return nil
  }

Comparing it with the implementation of Buffer.WriteByte:

  func (b *Buffer) WriteByte(c byte) error {
    b.lastRead = opInvalid
    m := b.grow(1)
    b.buf[m] = c
    return nil
  }

Which is much more complicated. Here is the benchmark showing the efficiency difference:

  BenchmarkByteSliceWrite1k   200000  9971 ns/op
  BenchmarkBytesBufferWrite1k 100000 11933 ns/op

At the same time, Buffer doesn't create the overhead for nothing. It supports UnreadByte and UnreadRune which are not supported by ByteSlice (they do need extra memory to support). But if one doesn't need them, which is most of the case for me, ByteSlice is obvious a better choice.

Wednesday, June 17, 2015

An alternative design for "container/heap" in Go.

Here is an alternative design of the heap package in Go:


The main difference is that elements need not be converted to interface{}'s for pushing and popping. This reduces the overhead of convertion from value to interface and vice verser. The trick is to use the last element as the in/out place so the interface doesn't need to touch the element. PushPop and Remove are replaced with PushLastPopToLast and RemoveToLast, respectively.

An example of a heap with integers is like this:

type IntHeap []int
func (h *IntHeap) Pop() int {
  heap.PopToLast((*sort.IntSlice)(h))
  res := (*h)[len(*h) - 1]
  *h = (*h)[:len(*h) - 1]

  return res
}

func (h *IntHeap) Push(x int) {
  *h = append(*h, x)
  heap.PushLast((*sort.IntSlice)(h))
}

Pushing and poping the elements are done by calling the corresponding methods in the type IntHeap other than calling heap.Push(h, el). The code looks much clearer for me.

In the case where the element is a struct, the benchmark shows about 10~15% of performance increase by removing the convertion:


BenchmarkDataHeap_Plus    100    10541648 ns/op
BenchmarkDataHeap_Pkg     100    11921974 ns/op

where each element is of the type:

type Data struct {
Value    string
Priority int
}

The closure (func) version of functions are also defined as PushLastFPopToLast and RemoveToLast, respectively. Using them can make the calling faster than using the interface.

The benchmark shows a 20~30% peformance improvement by using closure version vs the builtin heap:

BenchmarkDataHeap_F       200    8875933 ns/op

Heaps of int/string/float64 are predefined with support of customized less function. Here is the benchmark results (similar performance increase):

BenchmarkBuiltinIntHeap   300    5680077 ns/op
BenchmarkPlusIntHeap      300    4049263 ns/op
BenchmarkPlusIntHeap_Less 300    4933297 ns/op

Tuesday, February 24, 2015

A lock-free Java object pool

It's a little bit surprising for me that I could not find an article or library of a totally lock-free Java object pool by googling. The closet one is Lock Less Java Object Pool. Here comes my version of a totally lock-free Java object pool.

The code is shared as  a public project at Github. Fell free to share or fork it.

The idea is simple. All returned objects (which need to be cached) are stored in a stack, which is held in an AtomicReferenceArray instance and an AtomicInteger-typed variable stores the top position of the stack. allocation or freeing of an object contains two steps:
  1. Firstly, a change of the top variable needs to be reserved,
  2. Then we try get/put the element in the reserved position,
  3. Since some other thread may also reserve the same place because the stack could expand and shrink, we may have to start over if this happens, i.e. Step 2 failed.
Step 1 utilizes the method of compareAndSet of AtomicInteger in a loop and step 2 uses getAndSet/compareAndSet of AtomicReferenceArray. Both are in a lock-free style.

This framework is also known as an optimisitic locking. It is useful for cases when contentions are relatively rare events.

Here are some code fragments:

    public E alloc() {
        while (true) {
            // Try reserve a cached object in objects
            int n;
            do {
                n = top.get();
                if (n == 0) {
                    // No cached oobjects, allocate a new one
                    return factory.alloc();
                }
            } while (!top.compareAndSet(n, n - 1));
            // Try fetch the cached object
            E e = objects.getAndSet(n, null);
            if (e != null) {
                return e;
            }
            // It is possible that the reserved object was extracted before
            // the current thread tried to get it. Let's start over again.
        }
    }
    
    public void free(E e) {
        while (true) {
            // Try reserve a place in this.objects for e.
            int n;
            do {
                n = top.get();
                if (n == objects.length()) {
                    // the pool is full, e is not cached.
                    factory.free(e);
                }
            } while (!top.compareAndSet(n, n + 1));
            // Try put e at the reserved place.
            if (objects.compareAndSet(n + 1, null, e)) {
                return;
            }
            // It is possible that the reserved place was occupied before
            // the current thread tried to put e in it. Let's start over again.
        }
    }

Any comments are welcome!

Sunday, August 11, 2013

Comparison of Go data types for string processing.

String processing is a very common operation in an application. This post is going to talk about some data types used in string processing in Go language. Two main concerns are readibility(or maintaining) and efficiency.

String type

String is no-doubt the most direct type for string processing. Go provides many ways to represent a string literal in the source code.

A string in Go is a serial of immutable bytes. Many builtin string related  functions suppose the text a string representing is encoded with UTF-8 encoding.

Each string object contains a pointer to the start of the byte buffer and the length to the string. Sub-strings from a long one may share the same large buffer with the original one.

Since string is immutable, processing on a string means creating new string objects. Two kinds of new strings may be created:
  1. Sub-strings. e.g. sub := org[2:5]. In this case, a new string header is allocated, the pointer is set to some offset to the original pointer, and the lengh is computed and filled.
  2. A totally new string. E.g. catenation of strings. The underlying buffer is newly allocated, and bytes are filled as needed.
The following code fragment contains both string allocations:

    newStr := "prefix " + oldStr[1:10] + " suffix"

A string is comparable, or it can be a key in a Go map object.

Byte slice type

In Go, a string can be easily converted to a byte slice using similar grammar to type convertion:

    sl := []byte(str)

But one has to remember that, this convertion allocates a new byte slice object, and copies the bytes to the new slice. So changing the content of the slice will not affect the original string. (Otherwise that string is no longer immutable). For the same reason, converting back from a byte slice to a string also needs a new buffer to be allocated. In other words, this convertion is not cheap.

Like other type of slices in Go, a byte slice contains a header of a pointer to the buffer, the length of the slice, and the capability of the slice. This means a byte-slice consumes more meta memory than a string.

But a slice is mutable. The content, or the bytes, in a slice can be changed. So when you are appending sub strings to a byte slice, the underlying buffer may keep unchanged, unless the length exceeds the capability. A more commonly used pattern is to use a bytes.Buffer type, (or for my personally used, I prefer a more lightweight object "github.com/daviddengcn/go-villa".ByteSlice sometimes, which utilizing the Go type system and allocates fewer extra bytes). In some special cases, in-place operation can be performed on a byte slice.

A byte slice cannot be used as a key type in a map.

Byte array type

For some special situations, the length of a string is fixed, a byte array can be used. It needs no extra meta bytes, and is mutable. You can easily create a byte slice representing the whole or part of a byte array.

Content of a byte array is mutable. E.g. in-place processing can be performed like the byte slice. But the assignment of a byte array is like that of a struct, i.e. the whole array is copied to the destination.

A byte array type can be used as a key type in a map.

Rune slice type

A string can be converted to a rune slice in a grammar similar to type convertion:

    rsl := []rune(str)

supposing the bytes are UTF-8 encoding. Besides the allocation of the rune slice, the contents are decoded, other than copied, as the rune slice. This is more expensive than convertions between a string and a byte slice.

A rune slice cannot be used as a key type in a map.

Rune array type

Like the byte array type, rune array is used to represent strings with a fixed/limited number of runes. No extra meta bytes, and the contents are  mutable.

A rune array can be used as a key type in a map.

Summary

Here is a table of the summary to the differences of the above types and convertions between them.
byte-array
byte-slice
string
rune-slice
rune-array
type
[4]byte []byte string []rune [4]rune
size
fixed variable variable variable fixed
access
read/write read/write read only read/write read/write
comparable
yes no yes no yes
meta bytes
0 pointer+len+cap pointer+len pointer+len+cap 0
to byte-array
- copy(ar[:], sl) copy(ar[:], st) copy(ar[:], string(rsl)) copy(ar[:], string(rar[:]))
to byte-slice
ar[:] - []byte(st) []byte(string(rsl)) []byte(string(rar[:]))
to string
string(ar) string(sl) - string(rsl) string(rar[:])
to rune-slice
[]rune(string(ar)) []rune(string(sl)) []rune(st) - rar[:]
to rune-array
copy(rar[:], []rune(string(ar))) copy(rar[:], []rune(string(sl))) copy(rar[:], []rune(st)) copy(rar[:], rsl) -
(variables for byte array, byte slice, string, rune slice and rune array are ar, sl, st, rsl, rar, respectively)

Best practice

Some principles or best practice advices are given, out of my own experience:
  1. When you are processing a very small string, e.g. for showing some text to the console, using string directly is the best choice, since it is very direct and easily to read.
  2. Reading from a long string can use strings.Reader
  3. When you have to do a lot of modification on a string, e.g. modifying some content inside a long text, you can consider the whole processing totally in variables of the byte slice type. Text can be read as a byte slice, and a byte slice can be written out through an io.Writer instance. Go provides many builtin functions upon byte slices. Don't convert the bytes to strings, unless they have to be the key of a map.
  4. When possible, using a byte-array is very efficient. E.g. you need a large map with the key a string with a fixed/limited length.
  5. rune slices are used only when the performance is not in consideration.
  6. Using rune array is also efficienty, when possible, but convertion from and to UTF-8 encoding bytes may be expensive.