Index_D

D

data

breakdown, 33

consolidation, 53

as data profile input, 126-129

decay, 92

delivery delays, 13

discovery from, 153-154

duplicate, 268

extracting, 58-59, 126-129

flattening, 59

integration, 7, 62

loading, 61

matching, 56-57

moving/ restructuring, 52-62

normal forms of, 181-184

as precious resource, 3-5

qualifying, 82-83

rejection, 11

replication, 6-7

standardized representation, 170

using, 62-63

data accuracy

air quality analogy, 34

business case for, 103-118

characteristics, 29

consistency, 29-30

content, 29

costs, 108

as data quality assurance cornerstone, 257-259

decay, 50-51

defined, 29-32

form, 29

as fundamental requirement, 3, 23

improvement effects, 40

lack of, 3

object-level, 31-32

percentages, 34

problems, occurrence of, 67

summary, 41-42

total, 34-35

value of, 103-107

data capture processes, 89-92

auto-assist in recording process, 91-92

distance between event and recording, 90

error checking in recording process, 92

evaluation factors, 90

fact availability at recording, 90

feedback to recorder, 91

information verification at recording, 91

motivation of person doing recording, 91

number of handoffs after recording, 90

remedies, 95

skill of person doing recording, 91

time between event and recording, 90

data cleansing, 59-60

adding, 96-97

defined, 59

leaving out rows and, 60

problems, 59

routines, 59, 60

as short-term remedy, 97

tools, 21-22, 53

uses, 96-97

as value-level remedy, 171

data elements

analysis, 37

delay-prone, 50

matching, 56-57

revealing significant variances, 89

use of, 33

value indicators, 46

See also values

data entry

data rules checked during, 233

deliberate errors, 47-48

flawed processes, 44-46

forms, 45

as inaccuracy source, 44-49

mistakes, 44

null problem, 46

processes, 45

system problems, 48-49

windows, 45

data events analysis, 89-94

conversion to information products, 93-94

data capture processes, 89-92

data decay, 92

data movement/ restructuring processes, 92-93

points of examination, 89

data gathering (complex data rules), 239-240

business procedures, 240

database-stored procedures, 239

source code scavenging, 239

speculation, 240

See also complex data rule analysis

data gathering (simple data rules), 221-224

business procedures, 223-224

database-stored procedures, 222-223

source code scavenging, 221-222

speculation, 224

See also simple data rule analysis

data management

lack of, 1

team training, 15

technology, 1

data marts, 61

data models, 189-190

building, 209

developing, 209-210

for primary/foreign key pairs identification, 200

validating, 210

data monitoring, 20-21

adding, 96

continuous checking, 100-101

database, 21

defined, 20

post-implementation, 99-101

transaction, 20-21

validation, 100

data profiling, 20

"analysis paralysis," 142

analysts, 123, 127, 134

analytical methods, 136-140

approaches, 20

assertion testing, 137

bottom-up approach, 131

column property analysis, 132, 143-172

complex data rules, 240-244

conclusion, 83

as core competency technology, 142

data rule analysis, 134-135, 215-245

data type and, 159

defined, 20, 53, 119

discovery, 136

emergence, 119-120, 141

errors, 82

extraction for, 126-129

as foundation for remedies, 258-259

general model, 123-130

goals, 122

important databases, 140

inputs, 124-129

iterations and backtracking, 139

for knowledge base creation, 122

metadata verification, 139

methodology, 130-135

model illustration, 123

output, 20

overview, 121-142

participants, 123-124

process, 122

process steps, 131-132

products, 53

of secondary data stores, 140

software support, 139-140

steps diagram, 131

structure analysis, 132-134, 173-214

technology, 119-120, 122, 258

text columns, 163

value rule analysis, 135, 246-254

visual inspection, 138

when to use, 140-141

data profiling outputs, 129-130

facts, 130

latency, 130

metadata, 129

data profiling repository

business objects, 272

content, 272-278

data rules, 217, 277

data source, 273-274

defined, 121

domains, 273

inconsistency points in, 167

information, 124

issues, 278

schema definition, 272

synonyms, 276

table definitions, 274-276

value rules, 277-278

data quality

awareness, 9, 10-12

characterization of state, 9

defined, 24

definitions, 24-27

emergence, 70

as everyone's job, 257

facts, 130

high, moving to position of, 256-257

improvement requirements, 14-15

issues management, 80-102

as maintenance function, 104

as major corporate issue, 255-256

money spent on, 105

as universally poor, 10

visibility, 1

data quality assessment project, 110-112

age of application, 112

future costs potential, 111

hidden costs potential, 111

identified costs, 111

importance to corporation, 111

likelihood of major change, 112

primary value, 259

pure, 257

robustness of implementation, 112

See also business case

data quality assurance, 67-79

activities, 75-78

comparison, 74-75

data accuracy as cornerstone, 257-259

department, 69-71

educational materials, 18

elements, 16

experts and consultants, 17

as explicit effort, 256-257

as full-time task, 70

functions, 71

group, 69-71

implementation, 118

initiatives, 23

methodologies, 18

organizing, 257

program components, 71

program goals, 68

program structure, 69-78

project services, 75-77

rationale, 105

software development parallel, 70

software tools, 18-22

stand-alone assessments, 77

summary, 78-79

teach and preach function, 77-78

team, 68, 76, 77

technology, 16-22

data quality assurance methods, 71-75

comparison illustration, 72

inside-out, 72-73

outside-in, 73-74

types of, 71-72

data quality problems, 3-23

fixing requirements, 14-15

hiding, 11

impact, 12-14

liability consequences, 12

reasons for not addressing, 12

scope, 14

data rule analysis, 134-135

complex, 135, 237-245

definitions, 216-220

simple, 134-135, 215-236

data rule checkers, 234

data rule repository, maintaining, 235

data rules

in assertion testing, 137

column properties vs., 220

data profiling repository, 217, 277

dates, 226-227

defined, 134, 215, 217, 238

derived-value, 229

durations, 227

evaluation, 232-234

exceptions, 219

execution, 225-226, 241

hard, 218-219, 238

loose definition, 219

multiple-rows/same column, 230

as negative rules, 217

object subgrouping columns, 227-228

process rules vs., 219-220, 238

relationships, 215

soft, 218-219, 238

sources, 137

syntax examples, 218

tight definition, 219

types of, 226-230, 241-244

work flow, 228-229

See also complex data rules; simple data rules

data source, 273-274

data transformation routines, building, 170-171

data types, 157-159

character, noncharacter data in, 157-158

defined, 157

profiling and, 159

typical, 158

See also column properties

data warehouses, 61

database management systems (DBMSs), 22

correct data, 22

for structural role enforcement, 134

database monitors, 21

database procedures

complex data rule analysis, 239

simple data rules analysis, 222-223

databases, 6

data integration, 7

definitions, 190

demands on, 8

design anticipation, 27-28

errors, 49

factors, 10

flexibility, 28-29

importance, 8

quality, 9

source, 54, 56-57, 59, 169-170, 212

target, 56-57

date(s)

columns, 195-196

complex data rules, 241-242

domain, 146

extreme numbers on, 252

simple data rules, 226-227

decay

in cause investigation, 92

problems, 92

decay-prone elements, 50-52

accuracy, over time, 51

characteristics, 50

decay rate, 52

handling, 51-52

decision-making efficiency, 41

decisions

based on hard facts, 115-116

based on intuition, 116-117

based on probable value, 116

defensive checkers, 96

deliberate errors, 47-48

correct information not given, 47-48

correct information not known, 47

falsifying for benefit, 48

See also errors

denormalization, 59

cases of, 182-183

use of, 59

denormalized form, 182-183

denormalized keys, 179

denormalized tables, 182, 183

data repetition, 183

in relational applications, 214

derived columns, 179

derived-value rules, 229

descriptor columns, 195

discovery, 136

of column properties, 153-154

of functional dependencies, 191, 196-197

homonym, 204

in structure analysis, 191-192

of synonyms, 203-204

discrete value list, 160-161

domains, 145-148

concept, 145

data profiling repository, 273

date, 146

defined, 145

external standards, 147

macro-issues, 148

metadata repository, 145

micro-issues, 148

special, 164-165

unit of measure, 147-148

zip code, 147

domain synonyms, 186-187

defined, 186

existence, 187

structural value and, 186

testing for, 205

See also synonyms

duplicate data, 268

durations, 227



Data Quality(c) The Accuracy Dimension
Data Quality: The Accuracy Dimension (The Morgan Kaufmann Series in Data Management Systems)
ISBN: 1558608915
EAN: 2147483647
Year: 2003
Pages: 133
Authors: Jack E. Olson

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net