
新智元报道
新智元报道
【新智元导读】「微调你的模型,获得比GPT-4更好的性能」不只是说说而已,而是真的可操作。最近,一位愿意动手的ML工程师就把几个开源LLM调教成了自己想要的样子。



加载数据集
test_dataset
Dataset({features: ['name', 'eventrefnumber', 'text', 'StartDate', 'eventtype', 'province', 'citydistrict', 'village', 'targetgroup', 'commander', 'position', 'minkilled', 'mincaptured', 'capturedcharacterisation', 'killedcharacterisation', 'killq', 'captureq', 'killcaptureraid', 'airstrike', 'noshotsfired', 'dataprocessed', 'flagged', 'glossarymeta', 'minleaderskilled', 'minfacilitatorskilled', 'minleaderscaptured', 'minfacilitatorscaptured', 'leaderq'],num_rows: 724})
[IsafEvent(name='5',text='2013-01-S-025\n\nKABUL, Afghanistan (Jan. 25, 2013)\nDuring a security operation in Andar district, Ghazni province, yesterday, an Afghan and coalition force killed the Taliban leader, Alaudin. Alaudin oversaw a group of insurgents responsible for conducting remote-controlled improvised explosive device and small-arms fire attacks against Afghan and coalition forces. Prior to his death, Alaudin was planning attacks against Afghan National Police in Ghazni province.',start_date=datetime.date(2013, 1, 24),event_type={'insurgentskilled'},province={'ghazni'},target_group={'taliban'},min_killed=1,min_captured=0,killq=True,captureq=False,killcaptureraid=False,airstrike=False,noshotsfired=False,min_leaders_killed=1,min_leaders_captured=0,predictions={}),IsafEvent(name='2',text='2011-11-S-034\nISAF Joint Command - Afghanistan\nFor Immediate Release\n\nKABUL, Afghanistan (Nov. 20, 2011)\nA coalition security force detained numerous suspected insurgents during an operation in Marjeh district, Helmand province, yesterday. The force conducted the operation after receiving information that a group of insurgents were at a compound in the area. After calling for the men inside to come out peacefully, the insurgents emerged and were detained without incident.',start_date=datetime.date(2011, 11, 19),event_type={'detention'},province={'helmand'},target_group={''},min_killed=0,min_captured=4,killq=False,captureq=True,killcaptureraid=True,airstrike=False,noshotsfired=False,min_leaders_killed=0,min_leaders_captured=0,predictions={})]
json_str = events[0].model_dump_json(exclude={"text", "predictions"})print(json_str)
{"name":"5","start_date":"2013-01-24","event_type":["insurgentskilled"],"province":["ghazni"],"target_group":["taliban"],"min_killed":1,"min_captured":0,"killq":true,"captureq":false,"killcaptureraid":false,"airstrike":false,"noshotsfired":false,"min_leaders_killed":1,"min_leaders_captured":0}
from openai import OpenAIfrom rich import printimport jsonimport osdef query_openai(article_text: str, model: str) -> str:query = (f"The following is a press release issued by ISAF (formerly operating in Afghanistan):\n{article_text}\n\n""## Extraction request\n""Please extract the following information from the press release:\n""- The name of the event (summarising the event / text as a headline)\n""- The start date of the event\n""- The event type(s)\n""- The province(s) in which the event occurred\n""- The target group(s) of the event\n""- The minimum number of people killed during the event\n""- The minimum number of people captured during the event\n""- Whether someone was killed or not during the event\n""- Whether someone was captured or not during the event\n""- Whether the event was a so-called 'kill-capture raid'\n""- Whether an airstrike was used during the event\n""- Whether no shots were fired during the event\n""- The minimum number of leaders killed during the event\n""- The minimum number of leaders captured during the event\n\n""## Annotation notes:\n""- A 'faciliator' is not a leader.\n""- If a press release states that 'insurgents' were detained without further ""details, assign a minimum number of two detained. Interpret 'a couple' as ""two. Interpret 'several' as at least three, even though it may sometimes ""refer to seven or eight. Classify the terms 'a few', 'some', 'a group', 'a ""small group', and 'multiple' as denoting at least three, even if they ""sometimes refer to larger numbers. Choose the smaller number if no other ""information is available in the press release to come up with a minimally ""acceptable figure. Interpret 'numerous' and 'a handful' as at least four, ""and 'a large number' as at least five.\n\n""## Example:\n""Article text: 'ISAF Joint Command Evening Operational Update Feb. 19, 2011\nISAF Joint Command - ""Afghanistan\u20282011-02-S-143\u2028For Immediate Release \u2028\u2028KABUL, Afghanistan (Feb. 19)\u2028\u2028ISAF ""service members at a compound in Sangin district, Helmand province observed numerous insurgents north and south of ""their position talking on radios today. After gaining positive identification of the insurgent positions, the ""coalition troops engaged, killing several insurgents. Later, the ISAF troops observed more insurgents positioning ""in the area with weapons. After positive identification, coalition forces continued firing on the various insurgent ""positions, resulting in several more insurgents being killed.'\n\n"'Output: `{"name":"Several insurgents killed in ''Helmand","start_date":"2011-02-18","event_type":["insurgentskilled"],"province":["helmand"],"target_group":[""],"mi''n_killed":6,"min_captured":0,"killq":true,"captureq":false,"killcaptureraid":false,"airstrike":false,"noshotsfired"'':false,"min_leaders_killed":0,"min_leaders_captured":0}`')# set up the prediction harnessclient = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))response = client.chat.completions.create(model=model,response_format={"type": "json_object"},messages=[{"role": "system","content": "You are an expert at identifying events in a press release. You are precise ""and always make sure you are correct, drawing inference from the text of the ""press release.\n\n You always return a JSON string with the following schema: ""## JSON Schema details\n""Here is some of the schema for the JSON output string you ""should make use of: event_types = ['airstrike', 'detention', ""'captureandkill', 'insurgentskilled', 'exchangeoffire', 'civiliancasualty'], ""provinces = ['badakhshan', 'badghis', 'baghlan', 'balkh', 'bamyan', ""'day_kundi', 'farah', 'faryab', 'ghazni', 'ghor', 'helmand', 'herat', ""'jowzjan', 'kabul', 'kandahar', 'kapisa', 'khost', 'kunar', 'kunduz', ""'laghman', 'logar', 'nangarhar', 'nimroz', 'nuristan', 'paktya', 'paktika', ""'panjshir', 'parwan', 'samangan', 'sar_e_pul', 'takhar', 'uruzgan', ""'wardak', 'zabul'], target_groups = ['taliban', 'haqqani', 'criminals', ""'aq', 'hig', 'let', 'imu', 'judq', 'iju', 'hik', 'ttp', 'other']\n\n",},{"role": "user", "content": query},],temperature=1,)return response.choices[0].message.content
json_str = query_openai(events[0].text, "gpt-4o")print(json.loads(json_str))
{'name': 'Taliban leader Alaudin killed in Ghazni','start_date': '2013-01-24','event_type': ['insurgentskilled'],'province': ['ghazni'],'target_group': ['taliban'],'min_killed': 1,'min_captured': 0,'killq': True,'captureq': False,'killcaptureraid': True,'airstrike': False,'noshotsfired': False,'min_leaders_killed': 1,'min_leaders_captured': 0}
print(events[0])
IsafEvent(name='5',text='2013-01-S-025\n\nKABUL, Afghanistan (Jan. 25, 2013)\nDuring a security operation in Andar district, Ghazni province, yesterday, an Afghan and coalition force killed the Taliban leader, Alaudin. Alaudin oversaw a group of insurgents responsible for conducting remote-controlled improvised explosive device and small-arms fire attacks against Afghan and coalition forces. Prior to his death, Alaudin was planning attacks against Afghan National Police in Ghazni province.',start_date=datetime.date(2013, 1, 24),event_type={'insurgentskilled'},province={'ghazni'},target_group={'taliban'},min_killed=1,min_captured=0,killq=True,captureq=False,killcaptureraid=False,airstrike=False,noshotsfired=False,min_leaders_killed=1,min_leaders_captured=0,predictions={'gpt-4o': '{\n "name": "Taliban leader Alaudin killed in Ghazni",\n "start_date": "2013-01-24",\n "event_type": ["insurgentskilled", "captureandkill"],\n "province": ["ghazni"],\n "target_group": ["taliban"],\n "min_killed": 1,\n "min_captured": 0,\n "killq": true,\n "captureq": false,\n "killcaptureraid": true,\n "airstrike": false,\n "noshotsfired": false,\n "min_leaders_killed": 1,\n "min_leaders_captured": 0\n}','gpt-4-turbo': '{\n "name": "Taliban leader Alaudin killed in Ghazni",\n "start_date": "2013-01-24",\n "event_type": ["captureandkill"],\n "province": ["ghazni"],\n "target_group": ["taliban"],\n "min_killed": 1,\n "min_captured": 0,\n "killq": true,\n "captureq": false,\n "killcaptureraid": true,\n "airstrike": false,\n "noshotsfired": false,\n "min_leaders_killed": 1,\n "min_leaders_captured": 0\n}','gpt-3.5-turbo': '{\n "name": "Taliban leader Alaudin killed in Ghazni province",\n "start_date": "2013-01-24",\n "event_type": ["captureandkill"],\n "province": ["ghazni"],\n "target_group": ["taliban"],\n "min_killed": 1,\n "min_captured": 0,\n "killq": true,\n "captureq": false,\n "killcaptureraid": false,\n "airstrike": false,\n "noshotsfired": false,\n "min_leaders_killed": 1,\n "min_leaders_captured": 0\n}'})
def convert_to_dataset(data: List[BaseModel]) -> Dataset:dataset_dict = {}for field_name, field_value in data[0].__fields__.items():field_type = field_value.outer_type_if field_type in [str, int, float, bool, date]:dataset_dict[field_name] = [getattr(item, field_name) for item in data]elif field_type == set:dataset_dict[field_name] = [list(getattr(item, field_name)) for item in data]elif issubclass(field_type, BaseModel):dataset_dict[field_name] = [getattr(item, field_name).dict() for item in data]else:dataset_dict[field_name] = [getattr(item, field_name) for item in data]dataset = Dataset.from_dict(dataset_dict)return dataset
convert_and_push_dataset(events, "isafpressreleases_with_preds", split_name="test")
添加来自微调模型的预测
重新加载预测数据集
from datasets import load_datasetpreds_test_data = load_dataset("strickvl/isafpressreleases_with_preds")["test"].to_list()
微调TinyLlama的预测
from rich import printprint(preds_test_data[0])
{'name': '5','text': '2013-01-S-025\n\nKABUL, Afghanistan (Jan. 25, 2013)\nDuring a security operation in Andar district, Ghazni province, yesterday, an Afghan and coalition force killed the Taliban leader, Alaudin. Alaudin oversaw a group of insurgents responsible for conducting remote-controlled improvised explosive device and small-arms fire attacks against Afghan and coalition forces. Prior to his death, Alaudin was planning attacks against Afghan National Police in Ghazni province.','predictions': {'gpt-3.5-turbo': '{\n "name": "Taliban leader Alaudin killed in Ghazni province",\n "start_date": "2013-01-24",\n "event_type": ["captureandkill"],\n "province": ["ghazni"],\n "target_group": ["taliban"],\n "min_killed": 1,\n "min_captured": 0,\n "killq": true,\n "captureq": false,\n "killcaptureraid": false,\n "airstrike": false,\n "noshotsfired": false,\n "min_leaders_killed": 1,\n "min_leaders_captured": 0\n}','gpt-4-turbo': '{\n "name": "Taliban leader Alaudin killed in Ghazni",\n "start_date": "2013-01-24",\n "event_type": ["captureandkill"],\n "province": ["ghazni"],\n "target_group": ["taliban"],\n "min_killed": 1,\n "min_captured": 0,\n "killq": true,\n "captureq": false,\n "killcaptureraid": true,\n "airstrike": false,\n "noshotsfired": false,\n "min_leaders_killed": 1,\n "min_leaders_captured": 0\n}','gpt-4o': '{\n "name": "Taliban leader Alaudin killed in Ghazni",\n "start_date": "2013-01-24",\n "event_type": ["insurgentskilled", "captureandkill"],\n "province": ["ghazni"],\n "target_group": ["taliban"],\n "min_killed": 1,\n "min_captured": 0,\n "killq": true,\n "captureq": false,\n "killcaptureraid": true,\n "airstrike": false,\n "noshotsfired": false,\n "min_leaders_killed": 1,\n "min_leaders_captured": 0\n}','tinyllama-templatefree': '\n{"name":"Taliban leader killed in Ghazni","start_date":"2013-01-24","event_type":["insurgentskilled"],"province":["ghazni"],"target_group":["taliban"],"min_killed":1,"min_captured":0,"killq":true,"captureq":false,"killcaptureraid":false,"airstrike":false,"noshotsfired":false,"min_leaders_killed":1,"min_leaders_captured":0}','tinyllama-sharegpt':'{"name":"2","start_date":"2013-01-24","event_type":["airstrike"],"province":["ghazni"],"target_group":["taliban"],"min_killed":1,"min_captured":0,"killq":true,"captureq":false,"killcaptureraid":false,"airstrike":true,"noshotsfired":false,"min_leaders_killed":1,"min_leaders_captured":0}'},'start_date': datetime.date(2013, 1, 24),'province': ['ghazni'],'target_group': ['taliban'],'event_type': ['insurgentskilled'],'min_killed': 1,'min_captured': 0,'killq': True,'captureq': False,'killcaptureraid': False,'airstrike': False,'noshotsfired': False,'min_leaders_killed': 1,'min_leaders_captured': 0}
微调Mistral的预测
微调OpenAI的预测
微调Mistral模型(通过OpenPipe)
微调Solar LLM(通过Predibase)
微调Llama3的预测(通过OpenPipe)
from rich import printprint(preds_test_data[0])
{'name': '5','text': '2013-01-S-025\n\nKABUL, Afghanistan (Jan. 25, 2013)\nDuring a security operation in Andar district, Ghazni province, yesterday, an Afghan and coalition force killed the Taliban leader, Alaudin. Alaudin oversaw a group of insurgents responsible for conducting remote-controlled improvised explosive device and small-arms fire attacks against Afghan and coalition forces. Prior to his death, Alaudin was planning attacks against Afghan National Police in Ghazni province.','predictions': {'finetuned-llama3-7b-32k-openpipe':'{"name":"1","start_date":"2013-01-24","event_type":["insurgentskilled"],"province":["ghazni"],"target_group":["taliban"],"min_killed":1,"min_captured":0,"killq":true,"captureq":false,"killcaptureraid":true,"airstrike":false,"noshotsfired":false,"min_leaders_killed":1,"min_leaders_captured":0}','finetuned-mistral-7b-optimised-openpipe':'{"name":"1","start_date":"2013-01-24","event_type":["insurgentskilled"],"province":["ghazni"],"target_group":["taliban"],"min_killed":1,"min_captured":0,"killq":true,"captureq":false,"killcaptureraid":true,"airstrike":false,"noshotsfired":false,"min_leaders_killed":1,"min_leaders_captured":0}','finetuned-openai-gpt-3.5-turbo-1106':'{"name":"4","start_date":"2013-01-24","event_type":["insurgentskilled"],"province":["ghazni"],"target_group":["taliban"],"min_killed":1,"min_captured":0,"killq":true,"captureq":false,"killcaptureraid":true,"airstrike":false,"noshotsfired":false,"min_leaders_killed":1,"min_leaders_captured":0}','gpt-3.5-turbo': '{\n "name": "Taliban leader Alaudin killed in Ghazni province",\n "start_date": "2013-01-24",\n "event_type": ["captureandkill"],\n "province": ["ghazni"],\n "target_group": ["taliban"],\n "min_killed": 1,\n "min_captured": 0,\n "killq": true,\n "captureq": false,\n "killcaptureraid": false,\n "airstrike": false,\n "noshotsfired": false,\n "min_leaders_killed": 1,\n "min_leaders_captured": 0\n}','gpt-4-turbo': '{\n "name": "Taliban leader Alaudin killed in Ghazni",\n "start_date": "2013-01-24",\n "event_type": ["captureandkill"],\n "province": ["ghazni"],\n "target_group": ["taliban"],\n "min_killed": 1,\n "min_captured": 0,\n "killq": true,\n "captureq": false,\n "killcaptureraid": true,\n "airstrike": false,\n "noshotsfired": false,\n "min_leaders_killed": 1,\n "min_leaders_captured": 0\n}','gpt-4o': '{\n "name": "Taliban leader Alaudin killed in Ghazni",\n "start_date": "2013-01-24",\n "event_type": ["insurgentskilled", "captureandkill"],\n "province": ["ghazni"],\n "target_group": ["taliban"],\n "min_killed": 1,\n "min_captured": 0,\n "killq": true,\n "captureq": false,\n "killcaptureraid": true,\n "airstrike": false,\n "noshotsfired": false,\n "min_leaders_killed": 1,\n "min_leaders_captured": 0\n}','mistral-lora-templatefree': '1','tinyllama-sharegpt':'{"name":"2","start_date":"2013-01-24","event_type":["airstrike"],"province":["ghazni"],"target_group":["taliban"],"min_killed":1,"min_captured":0,"killq":true,"captureq":false,"killcaptureraid":false,"airstrike":true,"noshotsfired":false,"min_leaders_killed":1,"min_leaders_captured":0}','tinyllama-templatefree': '\n{"name":"Taliban leader killed in Ghazni","start_date":"2013-01-24","event_type":["insurgentskilled"],"province":["ghazni"],"target_group":["taliban"],"min_killed":1,"min_captured":0,"killq":true,"captureq":false,"killcaptureraid":false,"airstrike":false,"noshotsfired":false,"min_leaders_killed":1,"min_leaders_captured":0}','ft-solar-1-mini-chat-240612-predibase':'\n\n{"name":"2","start_date":"2013-01-24","event_type":["insurgentskilled"],"province":["ghazni"],"target_group":["taliban"],"min_killed":1,"min_captured":0,"killq":true,"captureq":false,"killcaptureraid":true,"airstrike":false,"noshotsfired":false,"min_leaders_killed":1,"min_leaders_captured":0}'},'start_date': datetime.date(2013, 1, 24),'province': ['ghazni'],'target_group': ['taliban'],'event_type': ['insurgentskilled'],'min_killed': 1,'min_captured': 0,'killq': True,'captureq': False,'killcaptureraid': False,'airstrike': False,'noshotsfired': False,'min_leaders_killed': 1,'min_leaders_captured': 0}
JSON有效性测试
from datasets import load_datasetdataset_with_preds = load_dataset("strickvl/isafpressreleases_test_predictions")["train"].to_list()

# find out how many of the predictions are None values or empty stringsmissing_values = {"gpt-4o": 0,"gpt-4-turbo": 0,"gpt-3.5-turbo": 0,"tinyllama-templatefree": 0,"tinyllama-sharegpt": 0,"finetuned-openai-gpt-3.5-turbo-1106": 0,"finetuned-llama3-7b-32k-openpipe": 0,"mistral-lora-templatefree": 0,"finetuned-mistral-7b-optimised-openpipe": 0,"ft-solar-1-mini-chat-240612-predibase": 0,}for row in dataset_with_preds:for model in row["predictions"]:if row["predictions"][model] is None or row["predictions"][model] == "":missing_values[model] += 1print(missing_values)
{'gpt-4o': 0,'gpt-4-turbo': 0,'gpt-3.5-turbo': 0,'tinyllama-templatefree': 0,'tinyllama-sharegpt': 38,'finetuned-openai-gpt-3.5-turbo-1106': 0,'finetuned-llama3-7b-32k-openpipe': 0,'mistral-lora-templatefree': 0,'finetuned-mistral-7b-optimised-openpipe': 0,'ft-solar-1-mini-chat-240612-predibase': 0}
start_date
province
target_group
event_type
min_killed
min_captured
killq
captureq
killcaptureraid
airstrike
noshotsfired
min_leaders_killed
min_leaders_captured
测试结果
开始日期准确性
{'gpt-4o': 527,'gpt-4-turbo': 522,'gpt-3.5-turbo': 492,'tinyllama-templatefree': 231,'tinyllama-sharegpt': 479,'finetuned-openai-gpt-3.5-turbo-1106': 646,'finetuned-llama3-7b-32k-openpipe': 585,'mistral-lora-templatefree': 0,'finetuned-mistral-7b-optimised-openpipe': 636,'ft-solar-1-mini-chat-240612-predibase': 649}

省份准确性
{'gpt-4o': 649,'gpt-4-turbo': 645,'gpt-3.5-turbo': 595,'tinyllama-templatefree': 335,'tinyllama-sharegpt': 660,'finetuned-openai-gpt-3.5-turbo-1106': 704,'finetuned-llama3-7b-32k-openpipe': 707,'mistral-lora-templatefree': 0,'finetuned-mistral-7b-optimised-openpipe': 711,'ft-solar-1-mini-chat-240612-predibase': 704}

目标群体准确性

事件类型准确性

min_killed准确性

标注说明:‘facilitator’ 不是领导者。如果新闻稿中提到‘叛乱分子’被拘留而没有进一步细节,则分配至少两名被拘留者。将‘a couple’解释为两人。将‘several’解释为至少三人,尽管有时可能指七或八人。将‘a few’、‘some’、‘a group’、‘a small group’和‘multiple’解释为至少三人,即使有时它们指代更多人数。如果新闻稿中没有其他信息来提供一个最低可接受的数字,请选择较小的数字。将‘numerous’和‘a handful’解释为至少四人,而‘a large number’解释为至少五人。
min_captured准确性

killq准确性

captureq准确性

killcaptureraid准确性

airstrike准确性

noshotsfired准确性

min_leaders_killed准确性

min_leaders_captured准确性

最终得分

数据隐私(不会将你的信息发送给 OpenAI) 更小的模型很可能意味着更好的性能(尽管我仍需测试和证明这一点) 整体上更多的控制 成本改进
微调效果显著,但是……
评估过程很痛苦
但可以了解是否在进步
下一步是什么



内容中包含的图片若涉及版权问题,请及时与我们联系删除


评论
沙发等你来抢