Viko
Cloud based (PaaS) text mining platform.
My role
Tech lead/CTO, responsible for the overall technical direction of the platform.
- Development methodology
- Technical architecture
- Technology and vendor selection
- Hiring of internal, contract and external resources
- Partnerships (e.g. AWS & Oracle)
Major challenges
The solution
Viko is a text mining platform that allows users to ask natural language questions e.g. “what is the bin collection schedule?” or “what is the customer service phone number?”. Viko reads through a document corpus, typically website CMS pages, FAQs, Word & PDF documents. It uses these data sources as context, when answering the question.
Viko supports two forms of question answering:
Extractive answers - plucking a fragment of text from a document to answer a question. e.g. given the question “do you offer next day delivery” the answer might be “we offer next day delivery” (assuming this fragment exists in a document)
Generative answers - generating an answer to a question, but using source documents only for context. Using the previous example, a generative answer to the question “do you offer next day delivery?” might be “yes”, although the source document (context) only contained the fragment “we offer next day delivery”
Viko also offers semantic search capabilities, enhancing full text search with reading comprehension. Viko uses machine learning and feedback loops to improve search accuracy. In that sense, Viko is self learning and requires little ongoing tuning
The backend is 80% Python / 20% Scala, with NodeJs (Express) used for APIs and orchestration:
- Python
- BERT & T5
- Tensorflow/Keras
- Nodejs (Express/Typescript)
- Svelte (Sveltekit/Typescript)
What did I learn?
We made a decision to pivot from a chatbot offering to our current search & question answering platform. The decision to pivot wasn’t based on demand (or lack thereof!) but on supply side constraints. We discovered that onboarding clients and potential clients was just too painful, for all involved. There was too much systems integration, client data feeds were typically poor quality and the feedback loop was too slow.
We got into a vicious cycle - onboaring was slow and costly, therefore only cost effective for large organisations. Large organisations are often slow to respond, making the feedback look even slower. We discovered that the real value of our chatbot platform lay in it’s ability to answer questions. We fell into question answering. We also dicovered that as we’re working with unstructured data, systems integration and data cleansing is much less of an issue.
So that was the big lesson learned - In the early stages of a project, a short feedback loop is essential.
Need help with your project?
Do you need some help or guidance with your project? Reach out to me (email is best)