This article walks you through creating an extraction model, from naming it to measuring its accuracy. If you are not yet sure whether extraction is the right model type, read the "What is an extraction model?" article first.
Prerequisites
Before you begin, make sure you have:
- Access to the Models section of the Tekst platform.
- A clear list of the fields (entities) you want to pull out of each message.
- A few representative example messages or documents you can use to check the results.
Step 1: Create the model
- Go to the Models section and select Create model.
- Choose Extraction model as the type.
- Select Let's get started.
- Enter a Model name and an optional Model description.
- Continue to the configuration step.
Step 2: Define your entities
Entities are the fields you want the model to extract. On the model's configuration screen:
- Add an entity for each value you need.
- Give each entity a clear name and a description of what to look for.
- Set the output format for each entity (text, number, integer, true/false, or a structured object).
- For values that can appear more than once in a message, mark the entity as a list so the model can return several of them.
- If some values belong together, group them under a parent object entity.
Clear names and descriptions have a large effect on accuracy, so be specific about what each field means.
Step 3: Build a test set
A test set is a collection of real messages for which you have confirmed the correct extracted values. It is how Tekst measures accuracy and how the model improves.
- Add conversations or documents to the test set.
- For each one, confirm the correct value for every entity (the "ground truth").
- Include a range of examples, including documents with different layouts and ones where some fields are missing.
Step 4: Review accuracy
Once you have a test set, Tekst evaluates the model and reports its accuracy per field and overall. Use the comparison view, which shows the Expected value next to the Predicted value for each field, to see exactly where the model is going wrong.
For a full explanation of how the score is calculated, see the "How extraction accuracy is measured" article.
Step 5: Publish the model
When you are satisfied with the results, publish the model so it starts extracting from live messages. The status badge tells you whether the model is up to date or still processing your latest changes.
What happens next
As your team corrects extracted values, those corrections feed back into the model and improve it over time. Tekst continuously monitors performance and retrains as needed - see How often are the AI models retrained?.
0 comments
Please sign in to leave a comment.