The unreasonable difficulty of string-to-number

tl;dr - properly use strto* in C.  Jump to the bottom for gotchas.

                                     ~~~

String to integer conversion is trivial, or at least it should be.
As much as I love the C programming language, the standard library is
unfortunately well known for being quite bad at such trivial tasks.

The casual approach to string to integer conversion is what lot of
people learned at the first programming course at school: atoi(3).
In case you don't know it, the problem with atoi(3) is that the user
is unable to tell apart a parsing error from a legitimate "0".

The right way of parsing an integer is by calling strtol(3) and
friends.  It is a family of functions, in that there is one dedicated
converter for each primitive type.  They all allow proper error checking,
although it is a little tricky to get it right.

This is the (strict) pattern I would recommend (example with strtoul):

    char *endptr = NULL;

    errno = 0
    value = strtoul(value_string, &endptr, base);
    if (errno)
        return handle_error(errno);

    /* to be more strict: */
    if (!endptr || *endptr)
        return handle_error(EINVAL);

    /* All good, we may use value */

In short, errno should be set to zero before calling the string
conversion function, and checked afterwards.  Optionally the endptr
parameter can be used to verify if the whole string was properly
parsed.

I find it incredible, but I keep seeing people doing it wrong.  The
most typical error I see is to neglect the (admittedly awkward) errno
dance.  Yet it must be common enough for BSD to came up with strtonum(3)
(available on other operating systems by linking against libbsd).

                                     ~~~

Lately I'm working with an arguably bad codebase, which is affected among
other things by a sloppy approach to compiler warnings.  Some of these are
related to integer conversion.  Adding a dependency to libbsd is not
an option.  I've decided to improve things by writing a simple wrapper
that implements the pattern above in a sensible way.

Here are my type-specific headers:

    int to_slong(signed long *dst, const char *src,
                 int base, signed long min, signed long max);
    int to_ulong(unsigned long *dst, const char *src,
                 int base, unsigned long min, unsigned long max);

    int to_sint(signed int *dst, const char *src,
                int base, signed int min, signed int max);
    int to_uint(unsigned int *dst, const char *src,
                int base, unsigned int min, unsigned int max);

    int to_sshort(signed short *dst, const char *src,
                int base, signed short min, signed short max);
    int to_ushort(unsigned short *dst, const char *src,
                int base, unsigned short min, unsigned short max);

    int to_schar(signed char *dst, const char *src,
                int base, signed char min, signed char max);
    int to_uchar(unsigned char *dst, const char *src,
                int base, unsigned char min, unsigned char max);
    int to_char(char *dst, const char *src,
                int base, char min, char max);

And here is my generic to_number, using the _Generic of C11[1]:

    #define to_number(dst, src, base, min, max) _Generic((dst), \
        signed long *     :      to_slong,                      \
        unsigned long *   :      to_ulong,                      \
        signed int *      :      to_sint,                       \
        unsigned int *    :      to_uint,                       \
        signed short *    :      to_sshort,                     \
        unsigned short *  :      to_ushort,                     \
        signed char *     :      to_schar,                      \
        unsigned char *   :      to_uchar,                      \
        char *            :      to_char)                       \
    (dst, src, base, min, max)

The implementation is trivially using the pattern above.


                                     ~~~

The interesting part of all this is what I learned about the C language

1. There are a few strto* variants that I ignored: strtoulmax/strtoimax

2. The `char` type is weird in that `char` is always equivalent to either
   `unsigned char` or `signed char`, yet it is a distinct type[2].

   It is possible to distinguish the two possible cases by checking the value
   of CHAR_MIN[3], and then proceed with a casting.

   Basically this is how I implemented to_char:

       int to_char(char *dst, const char *src, int base, char min, char max)
       {
       #if CHAR_MIN == 0
           return to_uchar((unsigned char *)dst, src, base,
                (unsigned char)min, (unsigned char)max);
       #else
           return to_schar((signed char *)dst, src, base,
                (signed char)min, (signed char)max);
       #endif
       }

3. This is more of a pitfall: the above preprocessor conditional will work
   even if CHAR_MIN is not defined.  Be sure that <limits.h> is included.

4. Bonus gotcha: strtoul seems to happily accept negative integers.  In other
   words, `strtoul("-123", NULL, 10)` will return `(unsigned long)-123`.
   This works at least on glibc.  I should check if this holds everywhere.

   Astonished at first, I started to find it reasonable, after reading
   paragraph 6.3.1.3 of the C standard:

       When a value with integer type is converted to another integer type
       other than _Bool, if the value can be represented by the new type, it
       is unchanged.

       Otherwise, if the new type is unsigned, the value is converted by
       repeatedly adding or subtracting one more than the maximum value that
       can be represented in the new type until the value is in the range of
       the new type.

       Otherwise, the new type is signed and the value cannot be represented
       in it; either the result is implementation-defined or an
       implementation-defined signal is raised.

   Yet, this might lead to bad surprises!

                                     ~~~

A big shout out to the good folks in #c, on freenode!


References:

[1] https://en.cppreference.com/w/c/language/generic
[2] https://www.iso-9899.info/n1570.html#6.2.5p15
[3] http://www.iso-9899.info/n1570.html#FOOTNOTE.45