Class CsvSchemaInferrer

java.lang.Object
org.elasticsearch.xpack.esql.datasource.csv.CsvSchemaInferrer

public class CsvSchemaInferrer extends Object
Infers column types from CSV data by sampling rows when headers lack explicit type annotations.

Each column starts at the most specific candidate type and widens on the first value that doesn't fit. Type candidates from most specific to least:

  1. BOOLEAN — only true/false (case-insensitive)
  2. INTEGER — fits in int
  3. LONG — fits in long
  4. DOUBLE — any floating-point number
  5. DATETIME — ISO-8601, date-only, zone-less timestamps
  6. KEYWORD — universal fallback (everything is a string)
Null and empty values are compatible with every type. Columns with only null/empty values default to KEYWORD. When a value doesn't fit the current candidate, the column widens to the next candidate. Boolean and datetime columns that were confirmed by at least one value skip directly to KEYWORD on mismatch (since a column with both "true" and "42" is most likely a string column, not numeric).

For files smaller than the sample size, all rows are used. The inference runs in a single sequential pass over the sample.