4. stars data model

This vignette explains the data model of stars objects, illustrated using artificial and real datasets.

Stars objects

stars objects consist of

a (possibly empty) named list of arrays, each having named dimensions
an attribute called dimensions with a dimensions object carrying dimension metadata
a class name that includes stars

A dimensions object is a named list of dimension elements, each describing the semantics a dimension of the data arrays (space, time, type etc). In addition to that, a dimensions object has an attribute called raster of class stars_raster, which is a named list with three elements:

dimensions length 2 character; the dimension names that constitute a spatial raster (or NA)
affine length 2 numeric; the two affine parameters of the geotransform (or NA)
curvilinear a boolean indicating whether a raster is a curvilinear raster (or NA)

The affine and curvilinear values are only relevant in case of raster data, indicated by dimensions to have non-NA values.

A dimension object describes a single dimension; it is a list with named elements

from: (numeric length 1): the start index of the array
to: (numeric length 1): the end index of the array
offset: (numeric length 1): the start coordinate (or time) value of the first pixel (i.e., a pixel/cell boundary)
delta: (numeric length 1): the increment, or cell size
refsys: (character, or crs): object describing the reference system; e.g. the PROJ string, or string POSIXct or PCICt (for 360 and 365 days/year calendars), or object of class crs (containing both EPSG code and proj4string)
point: (logical length 1): boolean indicating whether cells/pixels refer to areas/periods, or to points/instances (may be NA)
values: one of
- NULL,
- a vector with coordinate values (numeric, POSIXct, PCICt, or sfc),
- an object of class intervals (a list with two vectors, start and end, with interval start- and end-values), or
- a matrix with longitudes or latitudes for all cells (in case of curvilinear grids)

Clearly, offset and delta only apply to regularly discretized dimensions, and are NA if this is not the case. from and to will usually be 1 and the dimension size, but from may be larger than 1 in case a regular sub-grid got cut out or was cropped. Rectilinear and curvilinear grids need grid values in values; this can be irregularly spaced coordinate values, or coordinate intervals of irregular width, or spatial geometries encoded in an sfc vector (“list-column”), or a matrix with grid cell centre values (longitude or latitude) for curvilinear grids.

Grid type

Regular grids

With a very simple file created from a \(4 \times 5\) matrix

suppressPackageStartupMessages(library(stars))
m = matrix(1:20, nrow = 5, ncol = 4)
dim(m) = c(x = 5, y = 4) # named dim
(s = st_as_stars(m))
## stars object with 2 dimensions and 1 attribute
## attribute(s):
##     Min. 1st Qu. Median Mean 3rd Qu. Max.
## A1     1    5.75   10.5 10.5   15.25   20
## dimension(s):
##   from to offset delta refsys point values x/y
## x    1  5      0     1     NA FALSE   NULL [x]
## y    1  4      0     1     NA FALSE   NULL [y]

we see that

the rows (5) are mapped to the first dimension, the x-coordinate
the columns (4) are mapped to the second dimension, the y-coordinate
the from and to fields of each dimension define a range that corresponds to the array dimension:

dim(s[[1]])
## x y 
## 5 4

offset and delta specify how increasing row and column index maps to x and y coordinate values respectively.

When we plot this object, using the image method for stars objects,

image(s, text_values = TRUE, axes = TRUE)

we see that \((0,0)\) is the origin of the grid (grid corner), and \(1\) the coordinate value increase from one index (row, col) to the next. It means that consecutive matrix columns represent grid lines, going from south to north. Grids defined this way are regular: grid cell size is constant everywhere.

Many actual grid datasets have y coordinates (grid rows) going from North to South (top to bottom); this is realised with a negative value for delta. We see that the grid origing \((0,0)\) did not change:

attr(s, "dimensions")[[2]]$delta = -1
image(s, text_values = TRUE, axes = TRUE)

An example is the GeoTIFF carried in the package, which, as probably all data sources read through GDAL, has a negative delta for the y-coordinate:

tif = system.file("tif/L7_ETMs.tif", package = "stars")
st_dimensions(read_stars(tif))["y"]
##   from  to  offset delta                     refsys point values
## y    1 352 9120761 -28.5 SIRGAS 2000 / UTM zone 25S FALSE   NULL

Raster attributes, rotated and sheared grids

Dimension tables of stars objects carry a raster attribute:

str(attr(st_dimensions(s), "raster"))
## List of 3
##  $ affine     : num [1:2] 0 0
##  $ dimensions : chr [1:2] "x" "y"
##  $ curvilinear: logi FALSE
##  - attr(*, "class")= chr "stars_raster"

which is a list that holds

dimensions: character, the names of raster dimensions (if any), as opposed to e.g. spectral, temporal or other dimensions
affine: numeric, the affine parameters
curvilinear: a logical indicating whether the raster is curvilinear

These fields are needed at this level, because they describe properties of the array at a higher level than individual dimensions do: a pair of dimensions forms a raster, both affine and curvilinear describe how x and y as a pair are derived from grid indexes (see below) when this cannot be done on a per-dimension basis.

With two affine parameters \(a_1\) and \(a_2\), \(x\) and \(y\) coordinates are derived from (1-based) grid indexes \(i\) and \(j\), grid offset values \(o_x\) and \(o_y\), and grid cell sizes \(d_x\) and \(d_y\) by

\[x = o_x + (i-1) d_x + (j-1) a_1\]

\[y = o_y + (i-1) a_2 + (j-1) d_y\] Clearly, when \(a_1=a_2=0\), \(x\) and \(y\) are entirely derived from their respective index, offset and cellsize.

Note that for integer indexes, the coordinates are that of the starting edge of a grid cell; to get the grid cell center of the top left grid cell (in case of a negative \(d_y\)), use \(i=1.5\) and \(j=1.5\).

We can rotate grids by setting \(a_1\) and \(a_2\) to a non-zero value:

attr(attr(s, "dimensions"), "raster")$affine = c(0.1, 0.1)
plot(st_as_sf(s, as_points = FALSE), axes = TRUE, nbreaks = 20)

The rotation angle, in degrees, is

atan2(0.1, 1) * 180 / pi
## [1] 5.710593

Sheared grids are obtained when the two rotation coefficients, \(a_1\) and \(a_2\), are unequal:

attr(attr(s, "dimensions"), "raster")$affine = c(0.1, 0.2)
plot(st_as_sf(s, as_points = FALSE), axes = TRUE, nbreaks = 20)

Now, the y-axis and x-axis have different rotation in degrees of respectively

atan2(c(0.1, 0.2), 1) * 180 / pi
## [1]  5.710593 11.309932

4. stars data model

Edzer Pebesma

Stars objects

Grid type

Regular grids

Raster attributes, rotated and sheared grids

Rectilinear grids

Curvilinear grids