之前探索了使用LLM从长文本中提取简单数值并进行计算的示例。https://blog.csdn.net/liliang199/article/details/159244753这里进一步探索横跨两个文本的复杂数值的提取和计算示例。所用资料和代码修改和参考自网络资料。1 文档获取1.1 下载数据这里从SEC EDGAR 获取苹果公司 2022 和 2023 年 10-K 的文本版本。对应链接如下所示aapl-20220924https://www.sec.gov/ix?doc/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htmaapl-20230930https://www.sec.gov/ix?doc/Archives/edgar/data/320193/000032019323000106/aapl-20230930.htm为简化分析这里直接打开上述链接选中所有文本复制然后粘贴到本地。分别存储为aapl-20220924.txt和aapl-20230930.txt然后两文档合并若两文档合计token在128K以内则可直接拼接。with open(aapl-20230930.txt, r) as f: text_2023 f.read() with open(aapl-20220924.txt, r) as f: text_2022 f.read() print(f2023 长度: {len(text_2023)} 字符) print(f2022 长度: {len(text_2022)} 字符)输出如下所示2023 长度: 203704 字符2022 长度: 218592 字符1.2 tokens量估计这里使用tiktoken估计两个文档合并后的总token量示例程序如下所示。import tiktoken def num_tokens(text): enc tiktoken.get_encoding(cl100k_base) return len(enc.encode(text)) tokens_2023 num_tokens(text_2023) tokens_2022 num_tokens(text_2022) print(f2023 tokens: {tokens_2023}, 2022 tokens: {tokens_2022}, 合计: {tokens_2022 tokens_2023}) if tokens_2022 tokens_2023 120000: combined_text 苹果公司 2022 财年 10-K \n text_2022 \n\n 苹果公司 2023 财年 10-K \n text_2023 else: # 超出则需截断或使用 RAG combined_text text_2022[:60000] text_2023[:60000] # 简单截断可能导致信息丢失输出如下所示92k tokens在128k窗口内。2023 tokens: 45185, 2022 tokens: 47670, 合计: 928552 提取计算这里先说明需要提取的数据和计算指标。提取数据分别来自两个不同的文档比如2022财年总营收、2023财年总营收。部分计算指标会用到不同文档数据比如营收增长率、研发费用占营收比例变化。2.1 提示词这里采用提示词方式说明需要提取哪些数据以及需要计算哪些指标。提示词需清晰说明任务、给出计算要求并指示使用函数调用。这里还加入思维链指令让模型先推理再填写函数参数。prompt f 你是一位经验丰富的财务分析师。以下是苹果公司 2022 和 2023 财年 10-K 年报的部分文本。 请仔细阅读提取所需的财务数据并完成以下计算。所有金额单位统一为 **百万美元**。 **需要提取的原始数据必须从文本中查找** - revenue_20232023 财年总营收 - revenue_20222022 财年总营收 - cogs_20232023 财年营业成本 - cogs_20222022 财年营业成本 - net_income_20232023 财年净利润 - net_income_20222022 财年净利润 - r_and_d_20232023 财年研发费用 - r_and_d_20222022 财年研发费用 - total_assets_20232023 财年末总资产 - total_liabilities_20232023 财年末总负债 - operating_cash_flow_20232023 财年经营活动现金流 - capital_expenditure_20232023 财年资本支出通常为“购置固定资产”的现金流出 **需要计算的指标请根据上面提取的数据计算并填入 JSON** - revenue_growth营收增长率格式如 8.5% - gross_margin_20232023 毛利率格式如 40.2% - gross_margin_20222022 毛利率格式如 39.8% - net_profit_margin_20232023 净利润率格式如 25.0% - net_profit_margin_20222022 净利润率格式如 24.5% - r_and_d_pct_change研发费用占营收比例的变化百分点如 0.5pp - debt_to_assets_20232023 资产负债率格式如 80.1% - free_cash_flow_20232023 自由现金流单位百万美元数字 **请以 JSON 格式输出包含以上所有字段**。输出的 JSON 对象必须包含上述所有键且值为正确的数字或字符串百分比用字符串表示数字用数值表示。 文本内容 {combined_text} 请一步步推理然后输出 JSON。 2.2 LLM调用在准备好提示词后这里进一步调用 API 获取LLM的输出和推理过程并解析结果。response client.chat.completions.create( modelmodel_name, # 支持 JSON 模式的模型 messages[{role: user, content: prompt}], temperature0, max_tokens1500, response_format{type: json_object} # 强制输出 JSON ) # 提取返回的 JSON 字符串 content response.choices[0].message.content reasoning_content response.choices[0].message.reasoning_content print(模型返回的原始内容, content) print(f模型思考过程: , reasoning_content) # 解析 JSON try: result json.loads(content) print(\n解析后的结果) print(json.dumps(result, indent2, ensure_asciiFalse)) except json.JSONDecodeError as e: print(JSON 解析失败, e) # 可尝试用正则提取输出示例如下在思考过程(reasoning_content)中LLM先提取原始数据RawData再在梳理计算指标以及计算这些指标需要的数据。在此基础上给出计算数据并且进行了验证。虽然整个过程没有提到公式但后续验证代码示例说明所有指标均计算正确。可能LLM已经内化这些计算公式不需要显式提及。模型返回的原始内容 {revenue_2023: 383285,revenue_2022: 394328,cogs_2023: 214137,cogs_2022: 223546,net_income_2023: 96995,net_income_2022: 99803,r_and_d_2023: 29915,r_and_d_2022: 26251,total_assets_2023: 352583,total_liabilities_2023: 290437,operating_cash_flow_2023: 110543,capital_expenditure_2023: 10959,revenue_growth: -2.80%,gross_margin_2023: 44.13%,gross_margin_2022: 43.31%,net_profit_margin_2023: 25.30%,net_profit_margin_2022: 25.31%,r_and_d_pct_change: 1.15pp,debt_to_assets_2023: 82.37%,free_cash_flow_2023: 99584}模型思考过程: The user wants me to extract financial data from Apples 2022 and 2023 10-K reports and calculate specific financial metrics. I need to carefully read through the provided text to find the required data points.Let me identify what I need to extract:**Raw Data to Extract:**1. revenue_2023: 2023 fiscal year total revenue2. revenue_2022: 2022 fiscal year total revenue3. cogs_2023: 2023 fiscal year cost of sales4. cogs_2022: 2022 fiscal year cost of sales5. net_income_2023: 2023 fiscal year net income6. net_income_2022: 2022 fiscal year net income7. r_and_d_2023: 2023 fiscal year RD expense8. r_and_d_2022: 2022 fiscal year RD expense9. total_assets_2023: 2023 fiscal year end total assets10. total_liabilities_2023: 2023 fiscal year end total liabilities11. operating_cash_flow_2023: 2023 fiscal year operating cash flow12. capital_expenditure_2023: 2023 fiscal year capital expenditure (purchase of property, plant and equipment)**Calculated Metrics:**1. revenue_growth: Revenue growth rate2. gross_margin_2023: 2023 gross margin percentage3. gross_margin_2022: 2022 gross margin percentage4. net_profit_margin_2023: 2023 net profit margin percentage5. net_profit_margin_2022: 2022 net profit margin percentage6. r_and_d_pct_change: RD as % of revenue change (percentage points)7. debt_to_assets_2023: 2023 debt-to-assets ratio (liabilities/assets)8. free_cash_flow_2023: 2023 free cash flow (operating cash flow - capital expenditure)Ill start by locating the 2023 fiscal year data in the 2023 10-K report. The Consolidated Statements of Operations shows total net sales of $383,285 million for the year ended September 30, 2023, with cost of sales at $214,137 million and net income of $96,995 million. Research and development expenses totaled $29,915 million. From the Consolidated Balance Sheets, I can see total assets were $352,583 million and total liabilities were $290,437 million as of September 30, 2023. The Consolidated Statements of Cash Flows shows operating cash flow of $110,543 million and capital expenditures (payments for acquisition of property, plant and equipment) of $10,959 million for 2023.Now I need to find the corresponding 2022 data from the 2022 10-K report to complete my analysis.From the 2022 10-K Consolidated Statements of Operations, total net sales were $394,328 million for the year ended September 24, 2022, with cost of sales at $223,546 million and net income of $99,803 million. Research and development expenses were $26,251 million. The Consolidated Balance Sheets show total assets of $352,755 million and total liabilities of $302,083 million as of September 24, 2022. Operating cash flow from the Consolidated Statements of Cash Flows was $122,151 million, and capital expenditures were $10,708 million for 2022.Now I can calculate the key financial metrics. Revenue declined 2.81% year-over-year from $394,328 million to $383,285 million. Gross margin improved slightly from 43.31% in 2022 to 44.13% in 2023. Net profit margin decreased from 25.31% to 25.30%. RD spending as a percentage of revenue increased from 6.66% to 7.81%, representing a 1.15 percentage point increase. The debt-to-assets ratio improved from 85.63% to 82.37%. Free cash flow for 2023 was $99,584 million after subtracting capital expenditures from operating cash flow.I should verify these calculations against the source documents to ensure accuracy before finalizing the analysis.From the 2023 10-K Consolidated Statements of Operations, I can confirm: Total net sales of $383,285 million, cost of sales at $214,137 million, net income of $96,995 million, and RD expenses of $29,915 million. The Balance Sheet shows total assets of $352,583 million and total liabilities of $290,437 million. The Cash Flow Statement reports operating cash flow of $110,543 million and capital expenditures of $10,959 million.For 2022, the 10-K shows: Total net sales of $394,328 million, cost of sales at $223,546 million, net income of $99,803 million, and RD expenses of $26,251 million. The Balance Sheet indicates total assets of $352,755 million and total liabilities of $302,083 million. Operating cash flow was $122,151 million with capital expenditures of $10,708 million.Now Ill calculate the key metrics. Revenue growth came in at -2.80%, reflecting the year-over-year decline. Gross margin improved to 44.13% in 2023 from 43.31% in 2022. Net profit margin remained relatively stable at 25.30% versus 25.31%. RD intensity increased by 1.15 percentage points to 7.81% of revenue. The debt-to-assets ratio decreased to 82.37% from the prior year. Free cash flow for 2023 totaled $99,584 million after subtracting capital expenditures from operating cash flow.解析后的结果{revenue_2023: 383285,revenue_2022: 394328,cogs_2023: 214137,cogs_2022: 223546,net_income_2023: 96995,net_income_2022: 99803,r_and_d_2023: 29915,r_and_d_2022: 26251,total_assets_2023: 352583,total_liabilities_2023: 290437,operating_cash_flow_2023: 110543,capital_expenditure_2023: 10959,revenue_growth: -2.80%,gross_margin_2023: 44.13%,gross_margin_2022: 43.31%,net_profit_margin_2023: 25.30%,net_profit_margin_2022: 25.31%,r_and_d_pct_change: 1.15pp,debt_to_assets_2023: 82.37%,free_cash_flow_2023: 99584}2.3 真实对比这里通过与与真实财报数据进行比对评估模型准确性。示例代码如下# 真实数据单位百万美元 real_data { revenue_2023: 383285, revenue_2022: 394328, cogs_2023: 214137, cogs_2022: 223546, net_income_2023: 96995, net_income_2022: 99803, r_and_d_2023: 29915, r_and_d_2022: 26251, total_assets_2023: 352583, total_liabilities_2023: 290437, operating_cash_flow_2023: 110543, capital_expenditure_2023: 10959 } # 真实计算值 real_metrics { revenue_growth: f{(real_data[revenue_2023] - real_data[revenue_2022])/real_data[revenue_2022]*100:.2f}%, gross_margin_2023: f{(real_data[revenue_2023] - real_data[cogs_2023])/real_data[revenue_2023]*100:.2f}%, gross_margin_2022: f{(real_data[revenue_2022] - real_data[cogs_2022])/real_data[revenue_2022]*100:.2f}%, net_profit_margin_2023: f{real_data[net_income_2023]/real_data[revenue_2023]*100:.2f}%, net_profit_margin_2022: f{real_data[net_income_2022]/real_data[revenue_2022]*100:.2f}%, r_and_d_pct_change: f{(real_data[r_and_d_2023]/real_data[revenue_2023] - real_data[r_and_d_2022]/real_data[revenue_2022])*100:.2f}pp, debt_to_assets_2023: f{real_data[total_liabilities_2023]/real_data[total_assets_2023]*100:.2f}%, free_cash_flow_2023: real_data[operating_cash_flow_2023] - real_data[capital_expenditure_2023] } # 对比模型输出 for key in real_metrics: if key in result: pred result[key] real real_metrics[key] print(f{key}: 预测 {pred} vs 真实 {real}) else: print(f警告模型输出缺少字段 {key})输出示例如下输出显示LLM计算结果与真实指标非常接近。revenue_growth: 预测 -2.80% vs 真实 -2.80%gross_margin_2023: 预测 44.13% vs 真实 44.13%gross_margin_2022: 预测 43.31% vs 真实 43.31%net_profit_margin_2023: 预测 25.30% vs 真实 25.31%net_profit_margin_2022: 预测 25.31% vs 真实 25.31%r_and_d_pct_change: 预测 1.15pp vs 真实 1.15ppdebt_to_assets_2023: 预测 82.37% vs 真实 82.37%free_cash_flow_2023: 预测 99584 vs 真实 99584苹果公司10-K 2022 2023财务数据如下指标20232022总营收$383,285 M$394,328 M营业成本$214,137 M$223,546 M净利润$96,995 M$99,803 M研发费用$29,915 M$26,251 M总资产$352,583 M$352,755 M (2022末)总负债$290,437 M$302,083 M (2022末)经营活动现金流$110,543 M$122,151 M资本支出$10,959 M$10,708 M数据来源链接如下https://www.sec.gov/ix?doc/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htmhttps://www.sec.gov/ix?doc/Archives/edgar/data/320193/000032019323000106/aapl-20230930.htmreference---LLM数值提取-计算场景示例https://blog.csdn.net/liliang199/article/details/159244753LLM长上下文和数值类有效输出的关系探索https://blog.csdn.net/liliang199/article/details/159175752
LLM复杂数值的提取计算场景示例
之前探索了使用LLM从长文本中提取简单数值并进行计算的示例。https://blog.csdn.net/liliang199/article/details/159244753这里进一步探索横跨两个文本的复杂数值的提取和计算示例。所用资料和代码修改和参考自网络资料。1 文档获取1.1 下载数据这里从SEC EDGAR 获取苹果公司 2022 和 2023 年 10-K 的文本版本。对应链接如下所示aapl-20220924https://www.sec.gov/ix?doc/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htmaapl-20230930https://www.sec.gov/ix?doc/Archives/edgar/data/320193/000032019323000106/aapl-20230930.htm为简化分析这里直接打开上述链接选中所有文本复制然后粘贴到本地。分别存储为aapl-20220924.txt和aapl-20230930.txt然后两文档合并若两文档合计token在128K以内则可直接拼接。with open(aapl-20230930.txt, r) as f: text_2023 f.read() with open(aapl-20220924.txt, r) as f: text_2022 f.read() print(f2023 长度: {len(text_2023)} 字符) print(f2022 长度: {len(text_2022)} 字符)输出如下所示2023 长度: 203704 字符2022 长度: 218592 字符1.2 tokens量估计这里使用tiktoken估计两个文档合并后的总token量示例程序如下所示。import tiktoken def num_tokens(text): enc tiktoken.get_encoding(cl100k_base) return len(enc.encode(text)) tokens_2023 num_tokens(text_2023) tokens_2022 num_tokens(text_2022) print(f2023 tokens: {tokens_2023}, 2022 tokens: {tokens_2022}, 合计: {tokens_2022 tokens_2023}) if tokens_2022 tokens_2023 120000: combined_text 苹果公司 2022 财年 10-K \n text_2022 \n\n 苹果公司 2023 财年 10-K \n text_2023 else: # 超出则需截断或使用 RAG combined_text text_2022[:60000] text_2023[:60000] # 简单截断可能导致信息丢失输出如下所示92k tokens在128k窗口内。2023 tokens: 45185, 2022 tokens: 47670, 合计: 928552 提取计算这里先说明需要提取的数据和计算指标。提取数据分别来自两个不同的文档比如2022财年总营收、2023财年总营收。部分计算指标会用到不同文档数据比如营收增长率、研发费用占营收比例变化。2.1 提示词这里采用提示词方式说明需要提取哪些数据以及需要计算哪些指标。提示词需清晰说明任务、给出计算要求并指示使用函数调用。这里还加入思维链指令让模型先推理再填写函数参数。prompt f 你是一位经验丰富的财务分析师。以下是苹果公司 2022 和 2023 财年 10-K 年报的部分文本。 请仔细阅读提取所需的财务数据并完成以下计算。所有金额单位统一为 **百万美元**。 **需要提取的原始数据必须从文本中查找** - revenue_20232023 财年总营收 - revenue_20222022 财年总营收 - cogs_20232023 财年营业成本 - cogs_20222022 财年营业成本 - net_income_20232023 财年净利润 - net_income_20222022 财年净利润 - r_and_d_20232023 财年研发费用 - r_and_d_20222022 财年研发费用 - total_assets_20232023 财年末总资产 - total_liabilities_20232023 财年末总负债 - operating_cash_flow_20232023 财年经营活动现金流 - capital_expenditure_20232023 财年资本支出通常为“购置固定资产”的现金流出 **需要计算的指标请根据上面提取的数据计算并填入 JSON** - revenue_growth营收增长率格式如 8.5% - gross_margin_20232023 毛利率格式如 40.2% - gross_margin_20222022 毛利率格式如 39.8% - net_profit_margin_20232023 净利润率格式如 25.0% - net_profit_margin_20222022 净利润率格式如 24.5% - r_and_d_pct_change研发费用占营收比例的变化百分点如 0.5pp - debt_to_assets_20232023 资产负债率格式如 80.1% - free_cash_flow_20232023 自由现金流单位百万美元数字 **请以 JSON 格式输出包含以上所有字段**。输出的 JSON 对象必须包含上述所有键且值为正确的数字或字符串百分比用字符串表示数字用数值表示。 文本内容 {combined_text} 请一步步推理然后输出 JSON。 2.2 LLM调用在准备好提示词后这里进一步调用 API 获取LLM的输出和推理过程并解析结果。response client.chat.completions.create( modelmodel_name, # 支持 JSON 模式的模型 messages[{role: user, content: prompt}], temperature0, max_tokens1500, response_format{type: json_object} # 强制输出 JSON ) # 提取返回的 JSON 字符串 content response.choices[0].message.content reasoning_content response.choices[0].message.reasoning_content print(模型返回的原始内容, content) print(f模型思考过程: , reasoning_content) # 解析 JSON try: result json.loads(content) print(\n解析后的结果) print(json.dumps(result, indent2, ensure_asciiFalse)) except json.JSONDecodeError as e: print(JSON 解析失败, e) # 可尝试用正则提取输出示例如下在思考过程(reasoning_content)中LLM先提取原始数据RawData再在梳理计算指标以及计算这些指标需要的数据。在此基础上给出计算数据并且进行了验证。虽然整个过程没有提到公式但后续验证代码示例说明所有指标均计算正确。可能LLM已经内化这些计算公式不需要显式提及。模型返回的原始内容 {revenue_2023: 383285,revenue_2022: 394328,cogs_2023: 214137,cogs_2022: 223546,net_income_2023: 96995,net_income_2022: 99803,r_and_d_2023: 29915,r_and_d_2022: 26251,total_assets_2023: 352583,total_liabilities_2023: 290437,operating_cash_flow_2023: 110543,capital_expenditure_2023: 10959,revenue_growth: -2.80%,gross_margin_2023: 44.13%,gross_margin_2022: 43.31%,net_profit_margin_2023: 25.30%,net_profit_margin_2022: 25.31%,r_and_d_pct_change: 1.15pp,debt_to_assets_2023: 82.37%,free_cash_flow_2023: 99584}模型思考过程: The user wants me to extract financial data from Apples 2022 and 2023 10-K reports and calculate specific financial metrics. I need to carefully read through the provided text to find the required data points.Let me identify what I need to extract:**Raw Data to Extract:**1. revenue_2023: 2023 fiscal year total revenue2. revenue_2022: 2022 fiscal year total revenue3. cogs_2023: 2023 fiscal year cost of sales4. cogs_2022: 2022 fiscal year cost of sales5. net_income_2023: 2023 fiscal year net income6. net_income_2022: 2022 fiscal year net income7. r_and_d_2023: 2023 fiscal year RD expense8. r_and_d_2022: 2022 fiscal year RD expense9. total_assets_2023: 2023 fiscal year end total assets10. total_liabilities_2023: 2023 fiscal year end total liabilities11. operating_cash_flow_2023: 2023 fiscal year operating cash flow12. capital_expenditure_2023: 2023 fiscal year capital expenditure (purchase of property, plant and equipment)**Calculated Metrics:**1. revenue_growth: Revenue growth rate2. gross_margin_2023: 2023 gross margin percentage3. gross_margin_2022: 2022 gross margin percentage4. net_profit_margin_2023: 2023 net profit margin percentage5. net_profit_margin_2022: 2022 net profit margin percentage6. r_and_d_pct_change: RD as % of revenue change (percentage points)7. debt_to_assets_2023: 2023 debt-to-assets ratio (liabilities/assets)8. free_cash_flow_2023: 2023 free cash flow (operating cash flow - capital expenditure)Ill start by locating the 2023 fiscal year data in the 2023 10-K report. The Consolidated Statements of Operations shows total net sales of $383,285 million for the year ended September 30, 2023, with cost of sales at $214,137 million and net income of $96,995 million. Research and development expenses totaled $29,915 million. From the Consolidated Balance Sheets, I can see total assets were $352,583 million and total liabilities were $290,437 million as of September 30, 2023. The Consolidated Statements of Cash Flows shows operating cash flow of $110,543 million and capital expenditures (payments for acquisition of property, plant and equipment) of $10,959 million for 2023.Now I need to find the corresponding 2022 data from the 2022 10-K report to complete my analysis.From the 2022 10-K Consolidated Statements of Operations, total net sales were $394,328 million for the year ended September 24, 2022, with cost of sales at $223,546 million and net income of $99,803 million. Research and development expenses were $26,251 million. The Consolidated Balance Sheets show total assets of $352,755 million and total liabilities of $302,083 million as of September 24, 2022. Operating cash flow from the Consolidated Statements of Cash Flows was $122,151 million, and capital expenditures were $10,708 million for 2022.Now I can calculate the key financial metrics. Revenue declined 2.81% year-over-year from $394,328 million to $383,285 million. Gross margin improved slightly from 43.31% in 2022 to 44.13% in 2023. Net profit margin decreased from 25.31% to 25.30%. RD spending as a percentage of revenue increased from 6.66% to 7.81%, representing a 1.15 percentage point increase. The debt-to-assets ratio improved from 85.63% to 82.37%. Free cash flow for 2023 was $99,584 million after subtracting capital expenditures from operating cash flow.I should verify these calculations against the source documents to ensure accuracy before finalizing the analysis.From the 2023 10-K Consolidated Statements of Operations, I can confirm: Total net sales of $383,285 million, cost of sales at $214,137 million, net income of $96,995 million, and RD expenses of $29,915 million. The Balance Sheet shows total assets of $352,583 million and total liabilities of $290,437 million. The Cash Flow Statement reports operating cash flow of $110,543 million and capital expenditures of $10,959 million.For 2022, the 10-K shows: Total net sales of $394,328 million, cost of sales at $223,546 million, net income of $99,803 million, and RD expenses of $26,251 million. The Balance Sheet indicates total assets of $352,755 million and total liabilities of $302,083 million. Operating cash flow was $122,151 million with capital expenditures of $10,708 million.Now Ill calculate the key metrics. Revenue growth came in at -2.80%, reflecting the year-over-year decline. Gross margin improved to 44.13% in 2023 from 43.31% in 2022. Net profit margin remained relatively stable at 25.30% versus 25.31%. RD intensity increased by 1.15 percentage points to 7.81% of revenue. The debt-to-assets ratio decreased to 82.37% from the prior year. Free cash flow for 2023 totaled $99,584 million after subtracting capital expenditures from operating cash flow.解析后的结果{revenue_2023: 383285,revenue_2022: 394328,cogs_2023: 214137,cogs_2022: 223546,net_income_2023: 96995,net_income_2022: 99803,r_and_d_2023: 29915,r_and_d_2022: 26251,total_assets_2023: 352583,total_liabilities_2023: 290437,operating_cash_flow_2023: 110543,capital_expenditure_2023: 10959,revenue_growth: -2.80%,gross_margin_2023: 44.13%,gross_margin_2022: 43.31%,net_profit_margin_2023: 25.30%,net_profit_margin_2022: 25.31%,r_and_d_pct_change: 1.15pp,debt_to_assets_2023: 82.37%,free_cash_flow_2023: 99584}2.3 真实对比这里通过与与真实财报数据进行比对评估模型准确性。示例代码如下# 真实数据单位百万美元 real_data { revenue_2023: 383285, revenue_2022: 394328, cogs_2023: 214137, cogs_2022: 223546, net_income_2023: 96995, net_income_2022: 99803, r_and_d_2023: 29915, r_and_d_2022: 26251, total_assets_2023: 352583, total_liabilities_2023: 290437, operating_cash_flow_2023: 110543, capital_expenditure_2023: 10959 } # 真实计算值 real_metrics { revenue_growth: f{(real_data[revenue_2023] - real_data[revenue_2022])/real_data[revenue_2022]*100:.2f}%, gross_margin_2023: f{(real_data[revenue_2023] - real_data[cogs_2023])/real_data[revenue_2023]*100:.2f}%, gross_margin_2022: f{(real_data[revenue_2022] - real_data[cogs_2022])/real_data[revenue_2022]*100:.2f}%, net_profit_margin_2023: f{real_data[net_income_2023]/real_data[revenue_2023]*100:.2f}%, net_profit_margin_2022: f{real_data[net_income_2022]/real_data[revenue_2022]*100:.2f}%, r_and_d_pct_change: f{(real_data[r_and_d_2023]/real_data[revenue_2023] - real_data[r_and_d_2022]/real_data[revenue_2022])*100:.2f}pp, debt_to_assets_2023: f{real_data[total_liabilities_2023]/real_data[total_assets_2023]*100:.2f}%, free_cash_flow_2023: real_data[operating_cash_flow_2023] - real_data[capital_expenditure_2023] } # 对比模型输出 for key in real_metrics: if key in result: pred result[key] real real_metrics[key] print(f{key}: 预测 {pred} vs 真实 {real}) else: print(f警告模型输出缺少字段 {key})输出示例如下输出显示LLM计算结果与真实指标非常接近。revenue_growth: 预测 -2.80% vs 真实 -2.80%gross_margin_2023: 预测 44.13% vs 真实 44.13%gross_margin_2022: 预测 43.31% vs 真实 43.31%net_profit_margin_2023: 预测 25.30% vs 真实 25.31%net_profit_margin_2022: 预测 25.31% vs 真实 25.31%r_and_d_pct_change: 预测 1.15pp vs 真实 1.15ppdebt_to_assets_2023: 预测 82.37% vs 真实 82.37%free_cash_flow_2023: 预测 99584 vs 真实 99584苹果公司10-K 2022 2023财务数据如下指标20232022总营收$383,285 M$394,328 M营业成本$214,137 M$223,546 M净利润$96,995 M$99,803 M研发费用$29,915 M$26,251 M总资产$352,583 M$352,755 M (2022末)总负债$290,437 M$302,083 M (2022末)经营活动现金流$110,543 M$122,151 M资本支出$10,959 M$10,708 M数据来源链接如下https://www.sec.gov/ix?doc/Archives/edgar/data/320193/000032019322000108/aapl-20220924.htmhttps://www.sec.gov/ix?doc/Archives/edgar/data/320193/000032019323000106/aapl-20230930.htmreference---LLM数值提取-计算场景示例https://blog.csdn.net/liliang199/article/details/159244753LLM长上下文和数值类有效输出的关系探索https://blog.csdn.net/liliang199/article/details/159175752