V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
V2EX 提问指南
xiyangyang
V2EX  ›  问与答

Mysql 查询语句怎么写?

  •  
  •   xiyangyang · 2014-12-10 10:58:12 +08:00 · 3639 次点击
    这是一个创建于 3618 天前的主题,其中的信息可能已经有所发展或是发生改变。

    开发市场调查业务,有一个调查问题表responses (一百万行), 是市场调查的原始数据,包含survey_id(调查表的类型列),response_no(被调查者列),interview_date(调查日期), question_label(问题列), value(回答列), section_unique_id(部门列)等。
    每一行对应某个被调查者回答的一个问题和它的调查结果。一个被调查者一次会回答30个问题,所以会产生30行。
    另有一个计算公式表 (40行), 是对调查结果的分析计算公式,这些公式都是sql语句。
    根据计算公式,产生一个结果表results。
    现在的问题是,要设计一些sql语句,调查有多少人的回答是类似如下这样的组合:
    1. 回答问题Q1,答案是1或8或9
    并且
    2. 回答问题Q2,答案是1或8或9
    并且
    。。。。。。
    最好能用group by section。

    比如如下这个例子:
    计算公式的说明是这样。
    ((Q2A = 1 OR Q2A = 8 OR Q2A = 9) AND (Q2B = 1 OR Q2B = 8 OR Q2B = 9) AND (Q2C = 1 OR Q2C = 8 OR Q2C = 9) AND (Q2D = 1 OR Q2D = 8 OR Q2D = 9) AND (Q2E = 1 OR Q2E = 8 OR Q2E = 9) AND (Q2F = 1 OR Q2F = 8 OR Q2F = 9) AND (Q2G = 1 OR Q2G = 8 OR Q2G = 9) AND (Q2H = 1 OR Q2H = 8 OR Q2H = 9) AND (Q2I = 1 OR Q2I = 8 OR Q2I = 9) AND (Q5 = 1 OR Q5 = 8 OR Q5 = 9) AND (Q6 = 1 OR Q6 = 8 OR Q6 = 9))

    我现在写了这样一个mysql语句:

    SELECT section_unique_id as "section_unique_id", COUNT(*) as "hit" FROM responses WHERE
    question_label = "Q2A" AND value IN (1,8,9)
    AND
     (response_no, survey_id, interview_date) IN (SELECT DISTINCT response_no, survey_id, interview_date FROM responses WHERE question_label = "Q2B" AND value IN (1,8,9))
    AND
     (response_no, survey_id, interview_date) IN (SELECT DISTINCT response_no, survey_id, interview_date FROM responses WHERE question_label = "Q2C" AND  value IN (1,8,9))
    AND
     (response_no, survey_id, interview_date) IN (SELECT DISTINCT response_no, survey_id, interview_date FROM responses WHERE question_label = "Q2D" AND value IN (1,8,9))
    AND
     (response_no, survey_id, interview_date) IN (SELECT DISTINCT response_no, survey_id, interview_date FROM responses WHERE question_label = "Q2E" AND value IN (1,8,9))
     AND
     (response_no, survey_id, interview_date) IN (SELECT DISTINCT response_no, survey_id, interview_date FROM responses WHERE question_label = "Q2F" AND value IN (1,8,9))
     AND
     (response_no, survey_id, interview_date) IN (SELECT DISTINCT response_no, survey_id, interview_date FROM responses WHERE question_label = "Q2G" AND value IN (1,8,9))
     AND
     (response_no, survey_id, interview_date) IN (SELECT DISTINCT response_no, survey_id, interview_date FROM responses WHERE question_label = "Q2H" AND value IN (1,8,9))
     AND
     (response_no, survey_id, interview_date) IN (SELECT DISTINCT response_no, survey_id, interview_date FROM responses WHERE question_label = "Q2I" AND value IN (1,8,9))
     AND
     (response_no, survey_id, interview_date) IN (SELECT DISTINCT response_no, survey_id, interview_date FROM responses WHERE question_label = "Q5" AND value IN (1,8,9))
     AND
     (response_no, survey_id, interview_date) IN (SELECT DISTINCT response_no, survey_id, interview_date FROM responses WHERE question_label = "Q6" AND value IN (1,8,9))
    

    结果,运行一次,用了12秒,太慢了。
    请教mysql高手,有没有办法能加快计算速度。

    7 条回复    2014-12-10 16:23:36 +08:00
    cye3s
        1
    cye3s  
       2014-12-10 11:37:46 +08:00 via Android
    那么多子查询,快就怪了,先用sum(case...l
    )语句转横表,就是每人一行记录,包含所有问题回答,试试
    xiyangyang
        2
    xiyangyang  
    OP
       2014-12-10 12:09:13 +08:00
    对mysql不熟悉,但是要求比较急,可否请大侠写这个sql语句?谢谢了。
    airylinus
        3
    airylinus  
       2014-12-10 13:04:25 +08:00
    最简单的加速方法是用视图,怎么用查 MySQL 文档。
    另外,还可以把每个问题的答案转换成二进制,用二进制运算替代子查询。
    tranch
        4
    tranch  
       2014-12-10 13:14:44 +08:00
    SELECT section_unique_id AS "section_unique_id",
    COUNT(*) AS "hit"
    FROM responses
    WHERE (response_no,
    survey_id,
    interview_date) IN
    (SELECT DISTINCT response_no,
    survey_id,
    interview_date
    FROM responses
    WHERE question_label IN ("Q2A", "Q2B", "Q2C", "Q2B", "Q2D", "Q2E", "Q2F", "Q2G", "Q2H", "Q2I", "Q5", "Q6")
    AND value IN (1, 8, 9))
    xiyangyang
        5
    xiyangyang  
    OP
       2014-12-10 13:39:13 +08:00
    tranch:
    我们想要的是,知道有多少个这样的被调查者,他回答的问题满足这样的条件:Q2A in (1,8,9) 并且Q2B in (1,8,9), .....
    所以,question_label IN ("Q2A", "Q2B", 。。。。。。), 这样是不对的

    airylinus:
    每个问题的答案都是1到10的整数,没法转换成二进制,因为其他的计算公式还需要他们呢。
    tranch
        6
    tranch  
       2014-12-10 13:43:33 +08:00
    建议把以上问题贴到问答网站(比如 segmentfault.com)上去讨论,这里不适合发代码。
    mringg
        7
    mringg  
       2014-12-10 16:23:36 +08:00 via Android
    用子查询速度肯定慢
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   5525 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 27ms · UTC 08:29 · PVG 16:29 · LAX 00:29 · JFK 03:29
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.