我这边需要实现一个接口: 参数是
{"pdf_file": "http://myserver/somefolder/somefile.pdf"}
输出是
{"key1": "value1", "key2":"value2"}
~~
PDF 文件为几页到几十页的文字
现在测试结果是
下载这个文件大约需要 5 秒
处理大约需要 5 秒
问题时如何加快这个过程?
网上搜到的一个例子是:
~~~python
class Job(BaseModel):
uid: UUID = Field(default=uuid4())
status: str = "in_progress"
result: int = None
jobs: Dict[UUID, Job] = {}
async def run_in_process(fn, *args):
loop = asyncio.get_event_loop()
return await loop.run_in_executor(app.state.executor, fn, *args) # wait and return result
async def start_cpu_bound_task(uid: UUID, stream: io.BufferedReader, filename: str) -> None:
jobs[uid].result = await run_in_process(cpu_bound_func, stream, filename)
jobs[uid].status = "complete"
@app.post("/new_cpu_bound_task/", status_code=HTTPStatus.ACCEPTED)
async def task_handler(req: InsurancePoliciesExtractionReqeust, background_tasks: BackgroundTasks):
new_task = Job()
jobs[new_task.uid] = new_task
## 以下是我改动的
content = None
async with aiohttp.ClientSession() as session:
async with session.get(req.pdf_file) as resp:
if resp.status == 200:
content = io.BytesIO(await resp.read())
filename = os.path.basename(req.pdf_file)
## 以上是我的改动
background_tasks.add_task(start_cpu_bound_task, new_task.uid, content, filename)
return new_task
@app.get("/status/{uid}")
async def status_handler(uid: UUID):
return jobs[uid]
@app.on_event("startup")
async def startup_event():
app.state.executor = ProcessPoolExecutor()
@app.on_event("shutdown")
async def on_shutdown():
app.state.executor.shutdown()
1
nyxsonsleep 2022-06-24 00:36:26 +08:00 1
下载多线程。
处理?完全不明白,一个下载文件,下载文件与原文件相同即可,你要处理什么? 哪怕开 4 线程,合并文件的开销也就几毫秒吧。 |