难题！ Python 如何解析类似于 pipe 的 string，并转换成一个类型确定的 object 呢

想做一个东西，cli 或者当 library 都可以用。大概是

python main.py data.csv --transform "parser | get='name' | len==10 | original | parser | get='age'"

做的就是遍历所有行，parse 数据，找到“名字”长度为 10 的行，返回“age”

传输的数据：
class Item:
orignial: Any
value: Any

我写了 Parser, Len, GetField, Orignal 这几个 class 。初步计划是事先把 class 放到一个 dict 里，解析字符串为，并把操作符和数值拿来初始化 class

pipe_units = [
Parser()，
Get("=", "name")，
Len("==", "10")，
Original()，
Parser(),
Get("=", "name")，
]

然后 pipe = CompiledPipe(pipe_units)

wrapped_records = CsvReader(f) # 也是个 pipe unit

pipe.set_upstream(wrapped_records) # 或者 wrapped_records >> pipe

for out_record in pipe:
print(out_record)

这样的做法有什么明显缺陷吗？解析 pipe 字符串有什么比较好的方法吗？现在直接用 split 之类的方法来做，感觉很粗糙。这个 parse 动作，在业界有专有名词吗？谢谢各位

Parse

PIPE

Transform

9 条回复 • 2024-08-15 13:49:49 +08:00

fgwmlhdkkkw

208 天前

试试这个

https://github.com/lark-parser/lark

ipwx

208 天前

写 dsl 可以用 pyparsing

liberize

208 天前 via Android

如果 get 后面的 name 里包含'|'，直接用 split 有问题。

GeekGao

208 天前

```
from pyparsing import Word, alphanums, Suppress, Group, OneOrMore, Optional

def parse_pipeline(pipeline_string):
# 定义基本元素
command = Word(alphanums + "_")
argument = Word(alphanums + "_='")
pipe = Suppress("|")

# 定义命令结构
command_structure = Group(command + Optional(Group(OneOrMore(argument))))

# 定义整个管道结构
pipeline = OneOrMore(command_structure + Optional(pipe))

# 解析字符串
parsed = pipeline.parseString(pipeline_string)

result = []
for item in parsed:
if len(item) == 1:
result.append({"command": item[0], "args": []})
else:
result.append({"command": item[0], "args": item[1].asList()})

return result

# 使用
pipeline_str = "parser | get='name' | len==10 | original | parser | get='age'"
parsed_pipeline = parse_pipeline(pipeline_str)
print(parsed_pipeline)

```

Output:
```
[{'command': 'parser', 'args': []}, {'command': 'get', 'args': ["='name'"]}, {'command': 'len', 'args': ['==10']}, {'command': 'original', 'args': []}, {'command': 'parser', 'args': []}, {'command': 'get', 'args': ["='age'"]}]
```

抛砖引玉。