Skip to content

Data types

Q is designed for speed with very large data sets. Loose datatyping would compromise efficiency, as primitives silently cast from one type to another. Automatic type conversion is minimised by fine-grained data types.

Atoms

An atom is the smallest unit of data. An atom cannot be indexed.

There are 18 types of data atoms.

boolean     0b
guid        52cb20d9-f12c-9963-2829-3c64d8d8cb14
byte        0x00
short       0h
int         0i
long        0j, 0 
real        0e
float       0.0, 0f 
char        " "
symbol      `
timestamp   2023.03.16D11:51:26.923000000 
month       2000.01m
date        2000.01.01
datetime    2023.03.16T11:52:07.320
timespan    00:00:00.000000000
minute      00:00
second      00:00:00
time        00:00:00.000

The type keyword returns the type of its argument as a short. A negative sign indicates an atom.

q)type each (3;3.14159;"q";`q;2023.03m)
-7 -9 -10 -11 -13h

True and False

Booleans 10b are True and False, but all datatypes are truthy. Any zero value is False; others are True.

q)"cfjmpu"$/:0 1
"\000" 0f 0 2000.01m 2000.01.01D00:00:00.000000000 00:00
"\001" 1f 1 2000.02m 2000.01.01D00:00:00.000000001 00:01
q)not "cfjmpu"$/:0 1
111111b
000000b

Nulls

Every datatype except boolean has its own null value.

Taking the first item of an empty vector returns the null for that datatype.

q)show now:.z.d+.z.t
2023.10.09D21:15:57.508000000
q)first each 0#'(1;now;`abc;"abc")
0N
0Np
`
" "

Tok returns a null from an empty string – except for booleans, of course.

q)"J"$""
0N
q)1 null\"B"$""
00b
q)"G"$""
00000000-0000-0000-0000-000000000000

The null keyword indicates null values.

q)null first each 0#'(1;now;`abc;"abc";1b)
11110b

In noun syntax the Identity operator :: denotes the general null.

Infinities

The min and max of an empty vector return the positive and negative infinities of its datatype.

q)(min;max)@\:0#99h    / short
0W -0Wh
q)(min;max)@\:0#99i    / int
0W -0Wi
q)(min;max)@\:0#99.    / float
0w -0w
q)(min;max)@\:0#now    / timestamp
0W -0Wp

Casting infinities

An infinity becomes a finite number when cast to a broader datatype.

This can surprise you if, for example, an infinity is promoted for insertion into a vector of broader type.

q)"j"$(min;max)@\:0#99h
32767 -32767
q)@[til 10;3 7;:;] "j"$99 0Wh
0 1 2 99 4 5 6 32767 8 9

Strings and symbols

There is no String datatype.

What qbists call a string is a vector of characters. (See Data Structures.)

The closest thing in q to an immutable string is a symbol.

A symbol is written as a backtick followed by zero or more characters.

It is used to enumerate repeated strings such as stock codes.

`goog   / Google
`ibm    / IBM
`msft   / Microsoft

A symbol atom displays as text, but it is an enumeration: the underlying data is an integer index into a master list of strings called the symlist.

Literal and display forms

The literal form of a data type is how you write its atoms. Its display form is how the interpreter writes its atoms.

As far as is practical, literal and display forms are identical.

Not always:

  • The j suffix is optional in the literal form of longs and omitted in the display form.
  • A decimal point in a float makes an f suffix optional in the literal and omitted in the display.
  • GUIDs have no literal form.
  • Display forms of primitives defined in k, such as til, are their k definitions.

Type suffixes are omitted in the displays of arrays.

Casting between datatypes

The Cast operator $ casts data between types.

Its left argument is the target datatype, as a short, char, or symbol from the table of datatypes.

q)9h$999
999f 
q)"j"$3.14159 
3
q)`byte$"q" 
0x71

The string keyword returns a string representation of an atom.

q)"c"$3.14159       / cast to char
"\003"
q)string 3.14159    / string representation
"3.14159" 
q)string `ibm 
"ibm"
q)string 2000.01.01 
"2000.01.01"

Temporal data

All temporal datatypes have underlying numeric representations.

Dot notation is a convenience for temporal data.

q)show now:.z.d+.z.t
2023.10.09D18:31:07.007000000
q)(now.minute;now.second;now.month)
18:31
18:31:07
2023.10m

Functions are atoms but they are not data atoms

Functions are first-class objects in q, so they too are atoms. But they are not data atoms.

Functions include operators, keywords, projections, compositions, and lambdas. Each has its own datatype.

q)type each (+;2*;type;+/;{x*y+z})
102 104 101 107 100h 
q)/operator;projection;keyword;derived function;lambda

Exercises

No peeking.

Attempt each exercise before looking at its answer.

The exercises clarify your thinking. Reading answers does not.

  1. Review the Datatypes reference. Investigate anything unfamiliar.

  2. List everything you learned from the previous exercise.

  3. Distinguish between Tok, Cast and string: write a definition of each.

    Answer

    Cast is atomic: it casts each atom of y to type x.

    Tok is string-atomic: it interprets each string in y as an atom of type x. (x is an upper-case letter.)

    string is atomic: it represents each atom of its argument as a string.

  4. What is the relationship between Tok and string? Apply string to a string. What does it return, and why does it not return its argument?

    Answer

    Tok and string are inverses of each other. Tok interprets strings as data atoms; string represents data atoms as strings.

    q)string "string"
    ,"s"
    ,"t"
    ,"r"
    ,"i"
    ,"n"
    ,"g"
    

    Above, string represents each atom of "string" as a string, returning six 1-character strings.

    Casting a string to char is a no-op.

    q)"c"$"string"
    "string"
    
  5. Show that Cast is left-atomic.

    Answer
    q)(type'')("h";("ij");"f")$4
    -5h
    -6 -7h
    -9h
    

    Above, the left argument of Cast is a nested list. Obeying scalar extension, Cast iterates it over its atom right argument.

  6. What datatype does not have a literal form? How could you write one of its atoms?

    Answer

    GUIDs have no literal form.

    q)52cb20d9-f12c-9963-2829-3c64d8d8cb14
    '52cb20d9
      [0]  52cb20d9-f12c-9963-2829-3c64d8d8cb14
           ^
    

    You can cast a string (char vector) to GUID. (Note the use of Tok rather than Cast.)

    q)"G"$"52cb20d9-f12c-9963-2829-3c64d8d8cb14"
    52cb20d9-f12c-9963-2829-3c64d8d8cb14
    
  7. List all the temporal datatypes with their units.