The goal of cereal is to provide methods to serialize objects from vctrs to JSON, as well as back from JSON to vctrs objects.
You can install the released version of vetiver from CRAN with:
install.packages("cereal")
You can install the development version of cereal from GitHub with:
# install.packages("pak")
::pak("r-lib/cereal") pak
A data frame is a rectangular collection of variables (in the columns) and observations (in the rows). Each variable is a vector of one data type, like factor or datetime:
<- tibble::tibble(
df a = c(1.2, 2.3, 3.4),
b = 2L:4L,
c = Sys.Date() + 0:2,
d = as.POSIXct("2019-01-01", tz = "America/New_York") + 100:102,
e = sample(letters, 3),
f = factor(c("blue", "blue", "green"), levels = c("blue", "green", "red")),
g = ordered(c("small", "large", "medium"), levels = c("small", "medium", "large"))
)
df#> # A tibble: 3 × 7
#> a b c d e f g
#> <dbl> <int> <date> <dttm> <chr> <fct> <ord>
#> 1 1.2 2 2023-06-08 2019-01-01 00:01:40 z blue small
#> 2 2.3 3 2023-06-09 2019-01-01 00:01:41 c blue large
#> 3 3.4 4 2023-06-10 2019-01-01 00:01:42 t green medium
The vctrs package has a concept of a vector prototype which captures the metadata associated with a vector without keeping any of the data itself.
::vec_ptype(df)
vctrs#> # A tibble: 0 × 7
#> # ℹ 7 variables: a <dbl>, b <int>, c <date>, d <dttm>, e <chr>, f <fct>,
#> # g <ord>
The information stored in such a vector prototype includes, for
example, the levels of a factor and the timezone for a datetime. This
can be useful or important information when deploying code or models,
such as when using vetiver.
We could store this vector prototype as an R binary object saved as an
.rds
file, but with cereal, you can store this vector
prototype in plain text as JSON:
library(cereal)
<- cereal_to_json(df)
json
json#> {
#> "a": {
#> "type": "numeric",
#> "example": "1.2",
#> "details": []
#> },
#> "b": {
#> "type": "integer",
#> "example": "2",
#> "details": []
#> },
#> "c": {
#> "type": "Date",
#> "example": "2023-06-08",
#> "details": []
#> },
#> "d": {
#> "type": "POSIXct",
#> "example": "2019-01-01 00:01:40",
#> "details": {
#> "tzone": "America/New_York"
#> }
#> },
#> "e": {
#> "type": "character",
#> "example": "z",
#> "details": []
#> },
#> "f": {
#> "type": "factor",
#> "example": "blue",
#> "details": {
#> "levels": ["blue", "green", "red"]
#> }
#> },
#> "g": {
#> "type": "ordered",
#> "example": "small",
#> "details": {
#> "levels": ["small", "medium", "large"]
#> }
#> }
#> }
Storing prototype information as JSON (rather than a binary file) means it can be used as plain-text metadata for a model.
You can also convert from JSON back to the original prototype:
cereal_from_json(json)
#> # A tibble: 0 × 7
#> # ℹ 7 variables: a <dbl>, b <int>, c <date>, d <dttm>, e <chr>, f <fct>,
#> # g <ord>
For an approach to this same task using Python, see
Pydantic’s model.json()
.
This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
If you think you have encountered a bug, please submit an issue.
Either way, learn how to create and share a reprex (a minimal, reproducible example), to clearly communicate about your code.