Hello Nas,
Welcome to the Microsoft Q&A and thank you for posting your questions here.
I understand that your GPT-4o Finetuning Failed.
To accurately resolve the issue, kindly follow these step-by-step recommendations:
- Validate Schema Format in the each line in your
.jsonl
file must have exactly the following structure:{"prompt": "Translate to French: Hello", "completion": "Bonjour"}
- No extra keys, no nesting, no arrays.
- Each line must be valid standalone JSON (newline-delimited).
- Use the official Data Preparation Tool instead of manual validation, use the CLI tool from Microsoft using bash command:
openai tools fine_tunes.prepare_data -f yourfile.jsonl
It provides:- Exact line numbers with schema errors
- Suggestions on how to fix issues
- Check Encoding + Line Endings to ensure:
- File is UTF-8 without BOM
- Unix-style line endings (
\n
), not Windows (\r\n
) - PowerShell for re-encoding and line ending fix:
(Get-Content -Raw yourfile.jsonl) -replace "`r`n", "`n" | Set-Content -Encoding utf8 yourfile_clean.jsonl
- Handle Unicode and HTML safely, and only decode Unicode escape sequences if they are incorrectly double-escaped, i.e., appear as
\\ud83d\\udc49
instead of\ud83d\udc49
. If unsure, leave them — valid Unicode escape sequences are acceptable. To remove HTML tags safely (optional):import re def remove_html(text): return re.sub(r'<[^>]+>', '', text)
- Create a small
test.jsonl
with 3–5 rows and test fine-tuning:
{"prompt": "What is 2+2?", "completion": "4"}
{"prompt": "Translate to French: Apple", "completion": "Pomme"}
- If this works, the issue is clearly in the formatting of your original file.
I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.