CJ.Blog


Python 函数动态调用

Requirement #

Http -> Aliyun FC -> Python Handle -> dynamic condition -> Template

发票文件OCR解析识别后从JSON文件从中提取需要的信息。

一个供应商一个发票,有多个供应商,根据供应商名称来识别,我这里的做法是一个供应商一个py文件,文件名用供应商代码来命名。

在写的时候我想避免以下代码结构:

import 供应商1.py as s1
import 供应商2.py as s2
if supplier_code == '供应商1':
    s1.parse()
elif: supplier_code == '供应商2':
    s2.parse()
....

这种方式明显显得不够灵活和动态,于是经过查阅,发现Python可以动态导入模块 于是有了下面代码:

img

import importlib
import common as util
import traceback

def dispatch(plant:str,supplier:str,file_extension:str,ocr_result_obj):
    try:
        dynamic_module = importlib.import_module('template.' + supplier)
        func_call = getattr(dynamic_module, "parse")
        return func_call(plant,supplier,file_extension,ocr_result_obj,util)
    except Exception as e:
        traceback.print_exc()
    except AttributeError as ae:
        raise ModuleNotFoundError("调用失败: {0}".format(ae))
    except TypeError as te:
        raise ModuleNotFoundError("类型异常: {0}".format(te))
    except ModuleNotFoundError as e1:
        raise ModuleNotFoundError("没有找到模块: {0}".format(e1))

import_module 动态导入,getattr 动态调用模块函数,以此方法达到如果新增供应商解析模板,只要在template目录下新增一个名称对应的py文件,定义parse方法即可,其他的都不用管。

import re

def get_ln(text:str):
    pattern = re.compile(r'-?[1-9]\d*')
    return pattern.search(text).group()

def get_po(text:str):
    pattern = re.compile(r'P\s*X\s*Z\s*[0-9]{5}')
    result = pattern.match(text)
    if result is not None:
        return True
    else:
        return False

def get_invoice_amount(text:str):
    pattern = re.compile(r'00\s*USD$')
    result = pattern.search(text)
    if result is not None:
        return True
    else:
        return False

def get_ip_no(text:str):
    pattern = re.compile(r'^\bde\s*.*\s*[0-9]$', re.IGNORECASE)
    result = pattern.search(text)
    if result is not None:
        return True
    else:
        return False

def parse(plant,supplier,file_extension,ocr_result_obj,util):
    # print("收到参数:{0} {1}".format(plant,str(ocr_result_obj['Headers'])))
    PEOPERTY = ""

    results = None
    if file_extension == '.pdf':
        PEOPERTY = "Text"
        results = ocr_result_obj['Body']['Data']['Results']
    else:
        PEOPERTY = "Word"
        results = ocr_result_obj['Body']['Data']['WordsInfo']

    po = util.find_match_by(results,get_po,PEOPERTY)
    invoice_amount =  util.find_match_by(results,get_invoice_amount,PEOPERTY)
    lp_no = util.find_match_by(results,get_ip_no,PEOPERTY)
    ln = util.find_result_by_text('Invoice number',results,get_ln,PEOPERTY)

    if po is not None:
        po = po.replace(' ','')
    if lp_no is not None:
        lp_no = util.get_match_number(lp_no)
    if invoice_amount is not None:
        invoice_amount = invoice_amount.replace(' ','').replace(',','').replace(',','').replace('USD','')

    # return [ln,po,invoice_amount,lp_no]
    return util.wrapper(ln,invoice_amount,lp_no,po,None,None)

弊端:

被动态调用的函数不能再当前文件引入其他类库,会报错,我通过函数回调来解决,但增加了一些代码的复杂度。