vgi-java

Serve Haybarn catalogs, tables, and functions from a Java process over Apache Arrow IPC — the Java implementation of the VGI (Vector Gateway Interface) protocol.
Built by 🚜 Query.Farm

VGI lets Haybarn — Query Farm's independent derived distribution of DuckDB — ATTACH a catalog whose schemas, tables, and functions live in an external worker process. The vgi extension speaks an Arrow-IPC RPC protocol to that worker; this library is everything you need to write the worker side in Java. Your code registers functions and tables against a Worker builder — the library handles the wire protocol, schema negotiation, batch streaming, pushdown, and transports.

Wire-compatible with the Python reference implementation and the Go port: all three serve the same integration suite against the same C++ extension.

What you can serve

Catalog tables — named tables with inline schemas, comments, tags, constraints, foreign keys, and per-column statistics that feed the engine's optimizer.
Scalar functions — annotation-driven (ScalarFn): declare a compute() method and the parameter annotations generate the spec, bind-time validation, and dispatch.
Table functions — streaming producers with projection pushdown, filter pushdown, row-id, sampling, and time-travel (AT) support.
Table-in/out functions — exchange-style streaming transforms over input batches.
Table buffering functions — sink/source functions that buffer all input before emitting (distributed-aggregation style lifecycles: process → combine → finalize).
Aggregate functions — partial aggregation with cross-process state combine.
Catalog versioning — semver data/implementation version negotiation, release manifests, multi-branch tables, transactions, and attach options.

Requirements

Java 21+ at runtime. The shared-memory side-channel (zero-copy batch transfer with a co-located engine) additionally requires JDK 22+; on 21 it transparently falls back to pipe transport.
Haybarn with the vgi extension installed on the client side (it's in Haybarn's signed community channel: INSTALL vgi FROM community).

Installation

Artifacts are published to Maven Central under the farm.query group.

Gradle (Kotlin DSL):

dependencies {
    implementation("farm.query:vgi:0.1.0")
}

Maven:

<dependency>
  <groupId>farm.query</groupId>
  <artifactId>vgi</artifactId>
  <version>0.1.0</version>
</dependency>

The RPC layer (farm.query:vgirpc) comes in transitively.

Quickstart

A worker with one scalar function:

import farm.query.vgi.Worker;
import farm.query.vgi.scalar.Const;
import farm.query.vgi.scalar.ScalarFn;
import farm.query.vgi.scalar.Vector;
import org.apache.arrow.vector.BigIntVector;

public final class DemoWorker {

    /** {@code multiply(value INT64, factor INT64 [const]) -> INT64} */
    static final class Multiply extends ScalarFn {
        @Override public String name() { return "multiply"; }
        @Override public String description() { return "Multiplies a value by a constant factor"; }

        public void compute(@Vector BigIntVector value, @Const long factor, BigIntVector result) {
            int rows = value.getValueCount();
            for (int i = 0; i < rows; i++) {
                if (value.isNull(i)) {
                    result.setNull(i);
                } else {
                    result.set(i, value.get(i) * factor);
                }
            }
        }
    }

    public static void main(String[] args) {
        Worker worker = Worker.builder()
                .catalogName("demo")
                .registerScalar(new Multiply());
        worker.runFromArgs(args); // stdio by default; --unix / --http via flags
    }
}

The compute() signature drives everything: @Vector parameters are per-row input columns, @Const parameters are bind-time constants, @Setting parameters read session settings, and the last unannotated Arrow vector is the framework-allocated output.

The worker JVM needs two flags — Apache Arrow requires access to java.nio internals, and the shared-memory transport uses the FFM API:

--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED
--enable-native-access=ALL-UNNAMED

With the Gradle application plugin, bake them into the start script so the worker binary is self-contained:

application {
    mainClass.set("DemoWorker")
    applicationDefaultJvmArgs = listOf(
        "--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED",
        "--enable-native-access=ALL-UNNAMED",
    )
}

Without the --add-opens flag the worker fails at first query with Failed to initialize MemoryUtil.

Attach and query it from Haybarn:

INSTALL vgi FROM community;
LOAD vgi;
ATTACH 'demo' AS demo (TYPE vgi, LOCATION 'launch:/path/to/demo-worker');
SELECT demo.multiply(21, 2);  -- 42

The launch: location scheme starts the worker once behind a flock-coordinated Unix socket and reuses it across queries and engine processes — essential for JVM workers, which are expensive to cold-start. Plain subprocess (/path/to/worker) and http(s):// locations also work.

Example worker

The vgi-example-worker module (not published) is a complete worker with 90+ functions — scalar, table, aggregate, table-in/out, buffering, partitioned, multi-branch, transactional — that serves the canonical VGI integration suite. It is the best place to look for working patterns of any feature.

Related projects

Repository	What it is
Query-farm-haybarn/haybarn	Haybarn — the independent derived distribution of DuckDB by Query Farm
Query-farm/vgi	The vgi engine extension (C++) — the client side of the protocol
Query-farm/vgi-python	Python reference implementation of the worker side
Query-farm/vgi-go	Go implementation of the worker side
Query-farm/vgi-rpc-java	The transport-agnostic Arrow RPC framework this library builds on

License

Query Farm Source-Available License, Version 1.0.

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.github/workflows		.github/workflows
assets		assets
ci		ci
gradle/wrapper		gradle/wrapper
vgi-example-worker		vgi-example-worker
vgi		vgi
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vgi-java

What you can serve

Requirements

Installation

Quickstart

Example worker

Related projects

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vgi-java

What you can serve

Requirements

Installation

Quickstart

Example worker

Related projects

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages