Run txtai in native code
Execute workflows in native code with the Python C API
txtai currently has two main methods of execution: Python or via a HTTP API. There are API bindings for JavaScript, Java, Rust and Go.
This article presents a way to run txtai as part of a native executable with the Python C API. We'll run an example in C and even call txtai from assembly code!
Before diving into this article, it's important to emphasize that connecting to txtai via the HTTP API has a number of major advantages. This includes decoupling from Python, the ability to offload txtai to a different machine and scaling with cloud compute. With that being said, this article demonstrates an additional way to integrate txtai along with providing an informative and perhaps academic programming exercise.
Install dependencies
Install txtai
and all dependencies.
# Install txtai
pip install txtai[pipeline] sacremoses
# Remove tensorflow as it's not used and prints noisy log messages
!pip uninstall -y tensorflow
# Install python3.7-dev and nasm
!apt-get install python3.7-dev nasm
Workflow configuration
This configuration builds a workflow to translate input text to French. More information on workflows can be found in txtai's documentation.
summary:
path: sshleifer/distilbart-cnn-12-6
textractor:
join: true
lines: false
minlength: 100
paragraphs: true
sentences: false
tika: false
translation:
workflow:
summary:
tasks:
- action: textractor
task: url
- action: summary
translate:
tasks:
- action: translation
args:
- fr
Python C API
Next we'll build an interface to txtai workflows with the Python C API. This logic will load Python, create a txtai application instance and add methods to run workflows.
Some assumptions are made:
txtai is installed and available
A workflow is available in a file named
config.yml
The workflow only returns the first element
These assumptions are for brevity. This example could be expanded on and built into a more robust, full-fledged library.
While this example is in C, Rust has a well-maintained and popular library for interfacing with Python, PyO3. Interfacing with the Python C API is also possible in Java, JavaScript and Go but not as straighforward.
#include <Python.h>
// Global instances
PyObject *module = NULL, *app = NULL;
/**
* Create txtai module.
*/
PyObject* txtai() {
PyObject* module = NULL;
module = PyImport_ImportModule("txtai.app");
return module;
}
/**
* Create txtai application instance.
*/
PyObject* application() {
PyObject* app = NULL;
app = PyObject_CallMethod(module, "Application", "z", "config.yml");
return app;
}
/**
* Run txtai workflow.
*/
PyObject* run(char** args) {
PyObject* result = NULL;
result = PyObject_CallMethod(app, "workflow", "z[z]", args[0], args[1]);
return result;
}
/**
* Cleanup Python objects.
*/
void cleanup() {
// Ensure Python instance exists
if (Py_IsInitialized()) {
PyErr_Print();
Py_CLEAR(app);
Py_CLEAR(module);
Py_FinalizeEx();
}
}
/**
* Initialize a txtai application and run a workflow.
*/
const char* workflow(char** args) {
PyObject* result = NULL;
// Create application instance if it doesn't already exist
if (!Py_IsInitialized()) {
// Start Python Interpreter
Py_Initialize();
// Create txtai module
module = txtai();
// Handle errors
if (!module) {
cleanup();
return NULL;
}
// Create txtai application
app = application();
// Handle errors
if (!app) {
cleanup();
return NULL;
}
}
// Run workflow
result = run(args);
// Handle errors
if (!result) {
cleanup();
return NULL;
}
// Get first result
const char *text = PyUnicode_AsUTF8(PyIter_Next(result));
// Cleanup result
Py_CLEAR(result);
return text;
}
Run txtai workflow in C
Let's now write a C program to run a workflow using command line arguments as input.
#include <stdio.h>
extern char* workflow(char** argv);
extern void cleanup();
/**
* Run a txtai workflow and print results.
*/
int main(int argc, char** argv) {
if (argc < 3) {
printf("Usage: workflow <name> <element>\n");
return 1;
}
// Run workflow using command line arguments
char* text = workflow(argv + 1);
if (text) {
printf("%s\n", text);
}
// Cleanup
cleanup();
return 0;
}
Compile and run
Time to compile this all into an executable and run!
cc -c main.c -I/usr/include/python3.7m
cc -c workflow.c -I/usr/include/python3.7m
cc -o workflow workflow.o main.o -lpython3.7m
!./workflow translate "I'm running machine translation using a transformers model in C!"
J'exécute la traduction automatique à l'aide d'un modèle de transformateurs en C!
And there it is, a translation workflow from English to French in a native executable, all backed by Transformers models. Any workflow YAML can be loaded and run in C using this method, which is pretty powerful.
Embedding txtai in native executable adds libpython as a dependency (libraries from 3rd party modules such as PyTorch and NumPy also load dynamically). See output of ldd below. This opens up an avenue to embed txtai in native code provided it is acceptable to add libpython as a project dependency.
As mentioned above, connecting to a txtai HTTP API instance is a less tightly coupled way to accomplish the same thing.
ldd workflow | grep python
libpython3.7m.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython3.7m.so.1.0 (0x00007efcba85e000)
Machine learning in Assembly?
Now for a more academic exercise perhaps bringing you back to a computer organization/logic class from college. Let's see if we can run the same program in assembly!
global main
; External C library functions
extern puts
; External txtai functions
extern workflow, cleanup
; Default to REL mode
default REL
section .data
message: db "Usage: workflow <name> <element>", 0
section .text
; Print a usage message
usage:
mov rdi, message
call puts
jmp done
; Main function
main:
; Enter
sub rsp, 8
; Read argc - require workflow name and element (plus program name)
cmp rdi, 3
jl usage
; Run txtai workflow with argv params (skip program name) and print result
lea rdi, [rsi + 8]
call workflow
mov rdi, rax
call puts
done:
; Close txtai application instance
call cleanup
; Exit
add rsp, 8
ret
# Build workflow executable
nasm -felf64 main.asm
cc -c workflow.c -I/usr/include/python3.7m
cc -o workflow -no-pie workflow.o main.o -lpython3.7m
./workflow translate "I'm running machine translation using a transformers model with assembler!"
J'exécute la traduction automatique à l'aide d'un modèle de transformateurs avec assembleur!
Just as before, the input text is translated to French using a machine translation model. But this time the code executing the logic was in assembly!
Probably not terribly useful but using the lowest level of code possible proves that any higher-level native code can do the same.
Multiple workflow calls
Everything up to this point has been a single workflow call. Much of the run time is spent on loading models as part of the txtai workflow. The next example will run a series of workflow calls and compare how long it takes vs a single workflow command line call. Once again in assembly.
global main
; External C library functions
extern printf
; External txtai functions
extern workflow, cleanup
; Default to REL mode
default REL
section .data
format: db "action: %s", 10, "input: %s", 10, "output: %s", 10, 10, 0
summary: db "summary", 0
translate: db "translate", 0
text1: db "txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.", 0
text2: db "Traditional search systems use keywords to find data", 0
url1: db "https://github.com/neuml/txtai", 0
url2: db "https://github.com/neuml/paperai", 0
section .text
; Run txtai workflow and print results
%macro txtai 2
; Workflow name and element
push %2
push %1
; Run workflow
lea rdi, [rsp]
call workflow
; Print action-input-output
mov rdi, format
mov rsi, [rsp]
mov rdx, [rsp + 8]
mov rcx, rax
call printf
; Restore stack
add rsp, 16
%endmacro
; Main function
main:
; Enter
sub rsp, 8
; Run workflows
txtai translate, text1
txtai translate, text2
txtai summary, url1
txtai summary, url2
done:
; Close txtai application instance
call cleanup
; Exit
add rsp, 8
ret
time ./workflow translate "I'm running machine translation using a transformers model with assembler!"
J'exécute la traduction automatique à l'aide d'un modèle de transformateurs avec assembleur!
real 0m19.208s
user 0m11.256s
sys 0m3.224s
# Build workflow executable
nasm -felf64 main.asm
cc -c workflow.c -I/usr/include/python3.7m
cc -no-pie -o workflow workflow.o main.o -lpython3.7m
time ./workflow
action: translate
input: txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications.
output: txtai exécute des workflows d'apprentissage automatique pour transformer les données et construire des applications de recherche sémantique alimentées par l'IA.
action: translate
input: Traditional search systems use keywords to find data
output: Les systèmes de recherche traditionnels utilisent des mots-clés pour trouver des données
action: summary
input: https://github.com/neuml/txtai
output: txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. API bindings for JavaScript, Java, Rust and Go. Cloud-native architecture scales out with container orchestration systems (e. g. Kubernetes)
action: summary
input: https://github.com/neuml/paperai
output: paperai is an AI-powered literature discovery and review engine for medical/scientific papers. Paperai was used to analyze the COVID-19 Open Research Dataset (CORD-19) paperai and NeuML have been recognized in the following articles: Cord-19 Kaggle Challenge Awards Machine-Learning Experts Delve Into 47,000 Papers on Coronavirus Family.
real 0m22.478s
user 0m13.776s
sys 0m3.218s
As we can see, running 4 workflow actions is about the same runtime as a single action when accounting for model load times.
Wrapping up
This article walked through an example on how to run txtai with native code. While the HTTP API is a better route to go, this is another way to work with txtai!