Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Can someone explain why I'd want to use fine-tuning instead of a vector database (or some other way of storing data/context)?


Assuming you would want to fine-tune over a codebase or set of documents, I would argue vector databases and fine-tuning are completely different tools.

I would strongly recommend against fine-tuning over a set of documents as this is a very lossy information system retrieval system. LLMs are not well suited for information retrieval like databases and search engines.

The applications of fine-tuning that we are seeing have a lot of success is making completion models like LLaMA or original GPT3 become prompt-able. In essence, prompt-tuning or instruction-tuning. That is, giving it the ability to respond with a user prompt, llm output chat interface.

Vector databases, for now, are a great way to store mappings of embeddings of documents with the documents themselves for relevant-document information retrieval.

I would highly recommend skimming this RLHF paper for how demonstration data was used to make a model prompt-able [1]. Keep in mind RLHF is another concept all together and we might be seeing a revolution where it might become optional (thanks to LIMA)!

1: https://huyenchip.com/2023/05/02/rlhf.html


Great reply, here's an example from my own work:

I want the user to be able to ask technical questions about a set of documents, then the user should retrieve a summary-answer from those documents along with a source.

I first need to finetune GPT4 so it better understands the niche-specific technical questions, the words used, etc. I could ask the finetuned model questions, but it won't really know from where it got the information. Without finetuning the summarised answer will suffer, or it will pull out the wrong papers.

Then I need to use a vector database to store the technical papers for the model to access; now I can ask questions, get a decent answer, and will have access to the sources.


Thanks (to both you and the parent) for sharing these details. So is it fair to say the following:

1. Fine-tuning bakes the knowledge into the model, but getting the "source" of an answer to a specific question becomes cagey and it is unclear if the answer is accurate or just a hallucination.

2. Therefore vector databases, which can provide context to the LLM before it answers, can solve this "citation" problem, BUT:

3. We then have limits because of the context window of the LLM to begin with.

Is that a fair understanding, or have I totally gotten this incorrect?

Edit: Or, are you saying that you both fine-tune AND also use a vector database which stores the embeddings of the dataset used to fine-tune the model?


Ah! That makes sense! That's a neat strategy!


I asked ChatGPT this question, and asked it to simplify as much as possible.

Fine-tuned Models: Imagine you have a super-smart robot that can talk about anything. But you want it to be really good at talking about, say, dinosaurs. So, you teach it more about dinosaurs specifically. That's what fine-tuning is – you're teaching the robot (or model) to be really good at a specific topic.

Vector Databases and Embeddings with LLM: This might be a little tricky, but let's think of it this way. Imagine you have a huge library of books and you want to find information on a specific topic, say, ancient Egypt. Now, instead of reading every book, you have a magical index that can tell you which books talk about ancient Egypt. This index is created by magically converting each book into a "summary dot" (that's the embedding). When you ask about ancient Egypt, your question is also converted into a "summary dot". Then, the magical index finds the books (or "summary dots") that are most similar to your question. That's how the vector database and embeddings work.

So, if you want your super-smart robot to be really good at one specific topic, you use fine-tuning. But if you want it to quickly find information from a huge library of knowledge, you use vector databases and embeddings. Sometimes, you might even use both for different parts of the same task!


First reason that comes to mind is you can make much smaller models, which helps with latency, cost and may enable you to run the model locally.


Fine Tuning = Output

Embeddings = Input

Fine-tuning is like a chef modifying a general pizza recipe to perfect a specific pizza, such as Neapolitan. This customization optimizes the result. In AI, fine-tuning adjusts a pre-existing model to perform better on a specific task.

Embeddings are like categorizing ingredients based on properties. They represent inputs so that similar inputs have similar representations. For instance, 'dog' and 'puppy' in an AI model have similar meanings. Like ingredients in a pizza, embeddings help the model understand and interpret the inputs. So, fine-tuning is about improving the model's performance, while embeddings help the model comprehend its inputs.

It turns out, you can search a vector space of embeddings to find similar embeddings. If I turned my above post into 2 embeddings, and you searched for "golden retreiver" though neither paragraph has that exact phrase, the model should know a golden retreiver is most similar to the second paragraph that compares puppy to dog.


I like to think of an LLM as a literal human. Not sure if it's the best analogy.

Fine tuning = Adding years of experience, in a set environment. e.g. Raise them in a home that only speaks in old english, learn pig latin, send them to a bootcamp.

Embedding = Giving them a book to reference information.

Just like a human, memory might fade a bit through the years but old habits die hard. You might not perfectly recollect what you learned years ago, but you still get the general idea, and if you took a class on the referenced book you'll be better at relaying information from it.

Edit: Asked ChatGPT to create the analogy.

A language model is like an intelligent person.

- Pre-training is their broad education and general knowledge.

- Fine-tuning is their years of specialized experience in a specific field.

- Embedding is like giving them a comprehensive book on a particular subject.

Just as a person gains knowledge, expertise, and specialized resources, the language model develops its understanding and performance through pre-training, fine-tuning, and embedding.


Fine-tuning could be useful to get a high text completion quality out of a small model within a specific domain. You would still use the resulting model alongside an info retrieval system to prompt with real context (unless you have a use case where hallucination is a feature).


Wouldn't a vector database just get you nearest-neighbors on the embeddings? How would that answer a generative or extractive question? I can see it might get you sentiment, but would it help with "tell me all the places that are mentioned in this review"?


i think the point is that you use the vector database to locate the relevant context to pass to the LLM for question answering. here’s an end-to-end example:

https://www.dbdemos.ai/demo.html?demoName=llm-dolly-chatbot


Right. You feed the text chunks (from the matched embeddings) to a generative LLM to do the extractive/summarization part.


I've been playing with using documents as OpenAI embeddings for the past weeks and, at least for my use case, the results are meh. It seems sometimes just using context is not enough.

My next step is to play with fine tunning, but I have no results to report yet.


Try using InstructXL for embeddings. It’s got a more complex prompt structure for generating embeddings which might be more useful


have you tried other models to generate embeddings? I am going to that direction too to create an additional layer of helpers for search. Also, thinking if the document is not too big, it might fit into the initial context with the prompt


If the documents are large, try embedding smaller portions. If there's a heavy domain vocabulary, you might need a custom model.


I'd be very interested in knowing the outcome. Do you blog anywhere (or post on social)?


I think it probably works a lot better, but I would love to see some research validating this


I've read in a few places that it actually works worse in most cases. Much better to put the context in your prompt.


Fine tuning + context will outperform context alone, and it's cheaper to burn cycles fine tuning then use a smaller context than to use a larger context in production.


Fine tuning + same context will probably outperform context alone, but if you use a smaller context that does not seem to work that well as GP stated.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: