关于综合利用Benford法则与其他方法评估统计数据质量的进一步研究
Further Research about the Comprehensive Utilization of Benford’s Law and Other Methods to Evaluate the Statistical Data Quality

利用Benford法则对数据质量进行检验是一种已经在实践中得到广泛应用的重要方法。但该方法也存在一定局限性,针对其存在的问题,本文进一步探讨了如何将其与异常值探测、数据挖掘技术等方法相结合,从而找出可能存在数据质量问题的具体样本及其规律的方法。并利用该方法对我国保险行业2006—2011年主要经济指标的数据质量进行了实证分析,结果表明这种方法是合理且有效的。

Benford’s law is an important method which is widely used in data quality detection.However,Benford’s law has some limitations.To solve these problems,we further discussed how to combine Benford’s law with anomaly detection and data mining.Thus,we can identify specific sample which may have data quality problem and look for the law it’s appeared.Finally,we did empirical analysis on the quality of China’s insurance industry data in 2006-2011 by the proposed method.The results showed that this method is reasonable and effective.

国家社科基金重点项目“国家统计数据质量管理问题研究”(09AZD045)的阶段成果之一;

数据质量; Benford法则; 异常值探测; 数据挖掘;

Data Quality; Benford’s Law; Anomaly Detection; Data Mining;

10.19343/j.cnki.11-1302/c.2013.08.001

C81

统计研究

Statistical Research

2013year08issue

ISSN:1002-4565

Core Journals of China

8953-97328K