كيفية تحسين LLMs مع RAG

eilm

0 9 6 دقائق

الواردات

نبدأ بتثبيت واستيراد مكتبات بايثون الضرورية.

!pip install llama-index
!pip install llama-index-embeddings-huggingface
!pip install peft
!pip install auto-gptq
!pip install optimum
!pip install bitsandbytes
# if not running on Colab ensure transformers is installed too

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

إعداد قاعدة المعرفة

يمكننا تكوين قاعدة معارفنا من خلال تحديد نموذج التضمين وحجم القطعة وتداخل القطعة. هنا، نستخدم نموذج التضمين bge-small-en-v1.5 للمعلمة ~33M من BAAI، والمتوفر على مركز Hugging Face. تتوفر خيارات نموذج التضمين الأخرى على لوحة المتصدرين لتضمين النص.

# import any embedding model on HF hub
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")Settings.llm = None # we won't use LlamaIndex to set up LLM
Settings.chunk_size = 256
Settings.chunk_overlap = 25

بعد ذلك، نقوم بتحميل المستندات المصدرية الخاصة بنا. وهنا لدي مجلد اسمهمقالات“، والذي يحتوي على إصدارات PDF لثلاثة مقالات متوسطة كتبتها عن ذيول الدهون. إذا قمت بتشغيل هذا في Colab، فيجب عليك تنزيل مجلد المقالات من GitHub repo وتحميله يدويًا إلى بيئة Colab الخاصة بك.

بالنسبة لكل ملف في هذا المجلد، ستقوم الوظيفة أدناه بقراءة النص من ملف PDF، وتقسيمه إلى أجزاء (استنادًا إلى الإعدادات المحددة مسبقًا)، وتخزين كل قطعة في قائمة تسمى وثائق.

documents = SimpleDirectoryReader("articles").load_data()

نظرًا لأنه تم تنزيل المدونات مباشرةً كملفات PDF من موقع Medium، فإنها تشبه صفحة ويب أكثر من كونها مقالة جيدة التنسيق. لذلك، قد تتضمن بعض الأجزاء نصًا لا علاقة له بالمقالة، على سبيل المثال، رؤوس صفحات الويب وتوصيات المقالة المتوسطة.

في مقطع التعليمات البرمجية أدناه، أقوم بتنقيح الأجزاء في المستندات، وإزالة معظم الأجزاء قبل أو بعد محتوى المقالة.

print(len(documents)) # prints: 71
for doc in documents:
if "Member-only story" in doc.text:
documents.remove(doc)
continueif "The Data Entrepreneurs" in doc.text:
documents.remove(doc)
if " min read" in doc.text:
documents.remove(doc)
print(len(documents)) # prints: 61

وأخيرا، يمكننا تخزين القطع المكررة في قاعدة بيانات المتجهات.

index = VectorStoreIndex.from_documents(documents)

إعداد المسترد

مع وجود قاعدة معرفتنا، يمكننا إنشاء مسترد باستخدام LlamaIndex فيكتور إندكس ريتريفر(), الذي يُرجع الأجزاء الثلاثة الأكثر تشابهًا لاستعلام المستخدم.

# set number of docs to retreive
top_k = 3# configure retriever
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=top_k,
)

بعد ذلك، نحدد محرك استعلام يستخدم المسترد والاستعلام لإرجاع مجموعة من القطع ذات الصلة.

# assemble query engine
query_engine = RetrieverQueryEngine(
retriever=retriever,
node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.5)],
)

استخدم محرك الاستعلام

الآن، بعد إعداد قاعدة المعرفة ونظام الاسترجاع الخاص بنا، فلنستخدمه لإرجاع الأجزاء ذات الصلة بالاستعلام. سنقوم هنا بتمرير نفس السؤال الفني الذي طرحناه على ShawGPT (المستجيب لتعليقات YouTube) من المقالة السابقة.

query = "What is fat-tailedness?"
response = query_engine.query(query)

يقوم محرك الاستعلام بإرجاع كائن استجابة يحتوي على النص وبيانات التعريف والفهارس الخاصة بالأجزاء ذات الصلة. تُرجع كتلة التعليمات البرمجية أدناه نسخة أكثر قابلية للقراءة من هذه المعلومات.

# reformat response
context = "Context:\n"
for i in range(top_k):
context = context + response.source_nodes[i].text + "\n\n"print(context)

Context:
Some of the controversy might be explained by the observation that log-
normal distributions behave like Gaussian for low sigma and like Power Law
at high sigma [2].
However, to avoid controversy, we can depart (for now) from whether some
given data fits a Power Law or not and focus instead on fat tails.
Fat-tailedness — measuring the space between Mediocristan
and Extremistan
Fat Tails are a more general idea than Pareto and Power Law distributions.
One way we can think about it is that “fat-tailedness” is the degree to which
rare events drive the aggregate statistics of a distribution. From this point of
view, fat-tailedness lives on a spectrum from not fat-tailed (i.e. a Gaussian) to
very fat-tailed (i.e. Pareto 80 – 20).
This maps directly to the idea of Mediocristan vs Extremistan discussed
earlier. The image below visualizes different distributions across this
conceptual landscape [2].print("mean kappa_1n = " + str(np.mean(kappa_dict[filename])))
print("")
Mean κ (1,100) values from 1000 runs for each dataset. Image by author.
These more stable results indicate Medium followers are the most fat-tailed,
followed by LinkedIn Impressions and YouTube earnings.
Note: One can compare these values to Table III in ref [3] to better understand each
κ value. Namely, these values are comparable to a Pareto distribution with α
between 2 and 3.
Although each heuristic told a slightly different story, all signs point toward
Medium followers gained being the most fat-tailed of the 3 datasets.
Conclusion
While binary labeling data as fat-tailed (or not) may be tempting, fat-
tailedness lives on a spectrum. Here, we broke down 4 heuristics for
quantifying how fat-tailed data are.
Pareto, Power Laws, and Fat Tails
What they don’t teach you in statistics
towardsdatascience.com
Although Pareto (and more generally power law) distributions give us a
salient example of fat tails, this is a more general notion that lives on a
spectrum ranging from thin-tailed (i.e. a Gaussian) to very fat-tailed (i.e.
Pareto 80 – 20).
The spectrum of Fat-tailedness. Image by author.
This view of fat-tailedness provides us with a more flexible and precise way of
categorizing data than simply labeling it as a Power Law (or not). However,
this begs the question: how do we define fat-tailedness?
4 Ways to Quantify Fat Tails

إضافة RAG إلى LLM

نبدأ بتنزيل النموذج المضبوط من مركز Hugging Face.

# load fine-tuned model from hub
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizermodel_name = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name,
device_map="auto",
trust_remote_code=False,
revision="main")
config = PeftConfig.from_pretrained("shawhin/shawgpt-ft")
model = PeftModel.from_pretrained(model, "shawhin/shawgpt-ft")
# load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

وكخط أساس، يمكننا أن نرى كيف يستجيب النموذج للسؤال الفني دون أي سياق من المقالات. للقيام بذلك، نقوم بإنشاء قالب مطالبة باستخدام دالة لامدا، والتي تأخذ تعليق المشاهد وترجع مطالبة لماجستير القانون. لمزيد من التفاصيل حول مصدر هذه المطالبة، راجع المقالة السابقة من هذه السلسلة.

# prompt (no context)
intstructions_string = f"""ShawGPT, functioning as a virtual data science \
consultant on YouTube, communicates in clear, accessible language, escalating \
to technical depth upon request. It reacts to feedback aptly and ends \
responses with its signature '–ShawGPT'.ShawGPT will tailor the length of its responses to match the viewer's comment, \
providing concise acknowledgments to brief expressions of gratitude or \
feedback, thus keeping the interaction natural and engaging.
Please respond to the following comment.
"""
prompt_template = lambda comment: f'''[INST] {intstructions_string} \n{comment} \n[/INST]'''
comment = "What is fat-tailedness?"
prompt = prompt_template(comment)
print(prompt)

[INST] ShawGPT, functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request. It reacts to feedback aptly and ends responses with its signature '–ShawGPT'. ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, thus keeping the interaction natural and engaging.Please respond to the following comment.
What is fat-tailedness? 
[/INST]

بعد ذلك، يمكننا تمرير هذه المطالبة إلى النموذج باستخدام الكود أدناه.

model.eval()inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), 
max_new_tokens=280)
print(tokenizer.batch_decode(outputs)[0])

إليك استجابة النموذج (بدون سياق).

Great question!Fat-tailedness is a statistical property of a distribution. In simple terms, 
it refers to the presence of extreme outliers or heavy tails in the 
distribution.
For instance, consider the distribution of heights in a population. A normal 
distribution would have most people clustered around an average height with a 
few people deviating slightly from the mean. However, in a fat-tailed 
distribution, you would observe a larger number of people being significantly 
taller or shorter than the average.
In finance, fat-tailedness is often used to describe the presence of extreme 
market moves or outliers in financial returns. It's an important concept in 
risk management and financial modeling.
I hope this explanation helps! Let me know if you have any questions.
–ShawGPT

على الرغم من أن أسلوب الاستجابة وتنسيقها رائعان، إلا أن شرح النموذج للذيل السمين يختلف عن الطريقة التي قمت بتعريفها بها في سلسلة مقاطع الفيديو والمدونة الخاصة بي.

دعونا نرى ما يحدث لاستجابة النموذج عندما نقوم بتضمين السياق المناسب. للقيام بذلك، نقوم بإنشاء قالب مطالبة آخر، والذي يمكن أيضًا أن يأخذ في الاعتبار السياق من نظام RAG.

# prompt (with context)
prompt_template_w_context = lambda context, comment: f"""[INST]ShawGPT, functioning as a virtual data science consultant on YouTube, communicates in clear, accessible language, escalating to technical depth upon request. \
It reacts to feedback aptly and ends responses with its signature '–ShawGPT'. \
ShawGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, \
thus keeping the interaction natural and engaging.{context}
Please respond to the following comment. Use the context above if it is helpful.
{comment}
[/INST]
"""

بعد ذلك، نقوم بتمرير الموجه مع السياق من محرك الاستعلام وتعليق العرض إلى النموذج.

prompt = prompt_template_w_context(context, comment)inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=280)
print(tokenizer.batch_decode(outputs)[0])

إليك الرد الجديد (مع السياق).

Great question!Fat-tailedness is a property of a distribution that describes how much the 
presence of extreme values (rare events) affects the overall statistics of the 
distribution.
In the context of the video, the author discusses how Mediocristan and 
Extremistan distributions differ in terms of fat-tailedness. Mediocristan 
distributions have a relatively small impact from extreme values, while 
Extremistan distributions are heavily influenced by extreme values.
The author also mentions that fat-tailedness is a more general concept than 
specific distributions like Pareto or Power Laws. Instead, it lives on a 
spectrum ranging from thin-tailed (Gaussian) to very fat-tailed (Pareto 80-20).
I hope that helps clarify things a bit! Let me know if you have any questions.
–ShawGPT

يقوم هذا بعمل أفضل بكثير في التقاط شرحي للذيول السميكة من الاستجابة بدون سياق، بل ويستدعي المفاهيم المتخصصة لـ Mediocristan و Extremistan.

هنا، قدمت مقدمة سهلة للمبتدئين عن RAG وشاركت مثالًا ملموسًا لكيفية تنفيذها باستخدام LlamaIndex. تتيح لنا RAG تحسين نظام LLM من خلال المعرفة القابلة للتحديث والخاصة بالمجال.

في حين أن الكثير من الضجيج الأخير حول الذكاء الاصطناعي تركز على بناء مساعدي الذكاء الاصطناعي، فقد جاء ابتكار قوي (ولكنه أقل شعبية) من تضمين النص (أي الأشياء التي اعتدنا عليها للقيام باسترجاعها). في المقالة التالية من هذه السلسلة سأستكشف تضمينات النص بمزيد من التفاصيل، بما في ذلك كيفية استخدامها البحث الدلالي و مهام التصنيف.

المزيد عن LLMs 👇