Newsletters list:

“Big Data”
Visual facilitation
Agile planning
Churn modeling
Writing Survey Questions
Theory of Constraints
Hands On Data Mining
Data Vault
Time boxing
Surrounding requirements
Cybercrime
Retrospectives
Self Service BI
Internet Surveys
How to build predictive models
New Accounting Standards
Technical Reviews
Text Mining
Meta Data
Open Source BI
Data Warehouse Testing
Customer Value Management
Value From Transaction Data
Data Visualization
Survey Design
Predictive Modelling
Applied Probability Theory
Open Source
Software Testing
Data Warehouse Development
Data Quality Policy
History of Mathematics
Usability Research
Life Time Value
Balanced Scorecards
Survey Sampling
Agile Software Development
ETL
Neural Networks
Corporate Strategy
Missing Data
Segmentation
Decision Trees
XBRL
OLAP
Data Quality Assessment
Dashboards and Scorecards
Data Mining for CRM
Data Mining Algorithms
Data Preparation
Campaign Optimisation
Affinity Analysis
Vendor Selection
System Dynamics
Credit Scoring
Forecasting
Web Usage Analysis
Customer Profitability
Problem Analysis
Customer Satisfaction & Loyalty
IT Governance
Market Research
Search Engines
Marketing Accountability
CRM
Data Mining Models
Privacy
Data Warehousing
Data Quality

Newsletter Archive


“Big Data”

Our society is overflowing with data, and these data volumes keep growing at an unprecedented pace. Global growth in data volume is estimated at a staggering 60% per year, or about 10-fold in five years. Nothing seems to stop this tsunami coming in. Relational, SQL-based architectures don’t scale sufficiently to deal with this growth of -in particular- unstructured data. McKinsey’s Global Institute has labeled this trend “the next frontier for innovation, competition and productivity.”



Visual facilitation

Visual facilitation is a school of thought where producing and sharing graphics play a central role in creating consensus, brainstorming, strategy mapping, idea gathering, innovation, etc. All these activities revolve around group processes. Visual facilitation is an interactive way of using meeting graphics, and it brings together graphical creativity and interpersonal facilitation skills. Sometimes it is referred to as graphic facilitation or graphical recording, and it was originally popularized by a group of architects from San Francisco in the 70s.



Agile planning

Planning is a very important, and often undervalued part of Agile software development methodologies. It is eminently lightweight, and based firmly on empirical data as well as scientifically validated estimation methods. And in the Agile spirit, investments (planning takes time) are postponed as long as possible in favor of delivering value early and often, and gathering more valid data to underpin delivery plans.



Churn modeling

Churn modeling is the practice of determining a mathematical relation between customer characteristics and likelihood to cancel or end a business relationship. These can be relatively “static” (like gender, or ZIP code) or “behavioral” characteristics (e.g.: number of support queries in the last month). A churn model calculates the likelihood (probability) a customer will cease to do business with you in a given time period.



Writing Survey Questions

Writing (good) survey questions may seem more of an art than science. But there are many guidelines on using language properly. And common survey pitfalls can be avoided, too. In many ways, writing good survey questions is like gathering requirements: by focusing on what you are trying to measure with the survey, you attempt to eliminate as much ambiguity as you possibly can.



Theory of Constraints

The Theory of Constraints (ToC) emerged after widespread adoption of Lean manufacturing. It is a model that gets used to manage productivity. As such, it fits in very nicely with Lean, and focuses on (continuous) system level improvement by identifying and alleviating the bottleneck in throughput. You round up as many resources as needed to focus improvement at the system’s blockage. The rationale for this is that throughput of the system as a whole is determined by throughput at the bottleneck.



Hands On Data Mining

Data mining is part science and art. Tools play an important role, but not nearly as important as the ingenuity and versatility of the modeler. Data mining is a craft that you acquire by building a lot of models, using statistical foresight, knowledge of the data you are working with, and by reviewing all the products you have delivered, and how they are being used. Or by finding out why your model aren’t being used anymore…



Data Vault

Data Vault (DV) is an approach to data modeling that is specifically optimized for enterprise data warehouses (EDW’s). It was invented in the 90’s by Dan Linstedt, and published some 10 years later. Adoption has been slow, yet those who have embraced this new technology see clear benefits. It provides the benefits of Bill Inmon’s 3rd normal form data warehouse (DWH) models with the agility of Kimball style approaches.



Time boxing

A time box is a fixed amount of time that you allocate to further progress towards some task. When you are running a project, there are three components that form an “iron triangle”: time, scope and resources. Of these three, time is most easily measured. Time boxing helps you “fit” what you can accomplish into the time you have reserved for this (the time box).



Surrounding requirements

Requirements form the beginning of any (software) project: someone has a request that needs clarifying in order to get the project of to a good start. But producing valuable BI solutions from vague requirements, and often using unstable technology is very hard work!



Cybercrime

Cybercrime is a pernicious problem with many faces. It can be relatively innocent "cyber graffiti", without commercial intent. This is sometimes called "defacement" where hackers replace a homepage with some text, often of a political or religious nature. But it can be vicious like stealing of credit card data, click fraud, identity takeover, etc. Fighting cybercrime requires a hybrid approach involving both sophisticated technology as well as enforcement of safety work procedures.



Retrospectives

Retrospectives are a standard ingredient of Agile software development practices. They lie at the heart of any successful Agile implementation. Retrospection means looking back; retrospectives are a more or less formal part of Agile methodologies which are meant to continuously improve the team's effectiveness (so-called "velocity": output per time period).



Self Service BI

Business intelligence (BI) creates value by empowering knowledge workers. Self service BI is a relatively new trend in response to the ever growing need for timely, useful information. It embodies an acknowledgement that meeting these needs should be agile and flexible.



Internet Surveys

Internet surveys offer significant benefits over paper and pencil in terms of lay-out, formatting, and most importantly routing of questions. You have maximum control and liberty to use colors, logos, pictures, etc., and to manipulate their usage in light of your research objectives. They also enable fast turnaround times for analysis and reporting.



How to build predictive models

Building predictive models is the "bread and butter" of applied data mining. Because you can compare using a model with your current "best practice", it is relatively straightforward to monetize the added value of data mining. This calculation should be a default part of your process. Not just to support business decision making, but also to foster buy-in for your work.



New Accounting Standards

Financial accounting has traditionally been a mix of formality and fluidity. Some rules are quite rigid, and they're enforced using complex and detailed guidelines. At the same time, interpretation can be "fluid" as when interpretation is adapted to meet business needs. Market pressure has raised the needs for standards that are better aligned with business needs, yet at the same time provide consistency and control required to ensure reliability and comparability.



Technical Reviews

Technical reviews are management's measurement of (technical) product quality. You examine suitability of a software product. The main difference between informal reviews (as in pair programming) and formal reviews is that the purpose of the latter is to apprise management of quality and fitness for its intended use. Reviews can pertain to many things like: documentation, code, procedures, specifications, standards, etc.



Text Mining

The digital universe is growing at an astounding pace, and the majority of that growth is in unstructured information. Text mining plays a crucial role in extracting value from these oceans of data. An estimated 80-95% of all data is in unstructured format. Email, contracts, policies, patents, all potentially contain enormous value, if you know how to process them in an automated fashion.



Meta Data

Meta data are data about data. A more descriptive definition says meta data is everything you need to know in order to interpret information in its context. Examples are column headers in spreadsheets, the units of measurement, date and time you recorded a number, the definition used for "customer" or "lapsed", but also all "technical" information required to run a database like primary or foreign key, field length/type, etc.



Open Source BI

Open source business intelligence (BI) has come a long way. From being the new kid on the block, the ugly duckling, to, for instance, Talend positioned as visionary in Gartner's magic quadrant for data integration tools as of 2009. Open source is a force to be reckoned with.



Data Warehouse Testing

Data Warehouse (DWH) testing is a new and relatively immature discipline. There remains considerable disagreement whether DWH testing is any different from "normal" software testing, or whether there is merit in a DWH-specific testing approach.



Customer Value Management

Customer value management, together with marketing accountability, are central themes in modern day marketing. Instead of selling products in a market, we take a different tack: you develop customers to their fullest potential. Customers are explicitly acknowledged as the instrumental source of revenues. In Peter Drucker's words: the only profit center is a customer whose cheque hasn't bounced.



Value from Transaction Data

Transaction data are atomic building blocks that create value in the exchange between consumer and corporation. Extracting value from transaction data is a bit like trying to sip from a firehose. A value stream often consists of multiple (many) transactions. You need to consider these transactions together in order to understand the relation. And unless you understand the relation, you cannot interpret separate transactions.



Data Visualization

Data visualization is about presenting information in "some" graphical form. The human eye is a very powerful pattern detection instrument. When you transform a table into a graph, for instance, you don't add new information, but for many people it is easier to see long-term trends and individual dips and spikes. Visualizations are an important and valuable analytic tool that enables discerning patterns in data that would otherwise be extremely difficult to find.



Survey Design

Survey data are a useful complement to electronic registrations of behavior. They allow you to relate opinion to actions. Survey design choices impact your results, and these measurement effects ought to be mitigated. You can never take survey data at "face value" without an understanding of how the sample was selected and how subjects were interviewed.



Predictive Modeling

Predictive modeling is where data miners’ most visible “claim to fame” lies. Advances in software usability, computing power, and algorithms have brought this capability within the realm of modern business. Whether it is direct marketing, credit scoring, fraud detection, or forecasting, data mining has proven its worth.



Applied Probability Theory

Probability was only "invented" fairly recently, and is a sub-branch of mathematics. Commonly, Fermat and Pascal (halfway 17th century) are considered the founding fathers of probability theory. Because chance events are all around us, they pervade all of our lives.



Open Source

Open source is commonly associated with "free" software. This, however, is a misconception. Open source relates to "open standards", where source code and standards are laid in the public domain, free for all to see. The term "open source" was adopted in April 1998. Its fundamental philosophy is that if every developer contributes ideas, this "massive parallel process" will lead to the best, most robust applications. Indeed, some of the most vital applications (like running the NYSE) are based on open source.



Software Testing

Testing is an information gathering activity, aimed to support decision making. Since testing is never "free", the information gathered from it must have sufficient value for decision making to offset the costs. Costs can be investments in testing, as well as economic costs associated with a delay in release dates. Unless the outcome of testing actually influences decision making, you might as well forego it. Testing has value insofar it contributes to better (management) decisions about quality improvement, or going ahead to release a product after sufficient data points have been gathered to mitigate risks.



Data Warehouse Development

Software projects usually begin by gathering requirements and then building what is needed. Data warehouse (DWH) projects on the other hand typically begin by building what is needed, and only then do you wind up with requirements. This calls for a radically different approach to development and project planning.



Data Quality Policy

There’s a direct line from Corporate Governance to IT Governance to Data Governance. Since the Sarbanes-Oxley act (SOX) and similar regulations in Europe, CEO’s are more aware than ever of the risks involved with erroneous (financial) reporting. Data quality policies ensure compliance with reporting obligations, and send a message of “care” to customers.



History of Mathematics

Mathematics were never invented but rather discovered. It exists in every aspect of our lives, all around us. Humans have been known to use numbers for about 8000 years. Around 3500 BC the first account of cross tabs were kept which Assyrians used for collecting taxes. Maya's used quipus around 2500 BC to keep track of counts. Symbolic mathematics with equations, theorems and proofs began around 500 BC. In the 17th century calculus was invented (independently by Newton and Leibniz) and modern abstract algebra where symbols like x and y denote arbitrary entities stems from the mid 19th century. As this historical account shows, the growth spurt in mathematical development is of relative recent origin.



Usability Research

Usability is invisible. As long as things are going well, nobody seems to notice. Hardly anyone ever complains that something is too easy to use, right? Interestingly, you measure usability by the absence of problems that users experience. Usability becomes ever more important as technology permeates our society and increasingly gets used by people with less technical background than, say, 20 years ago.



Life Time Value

Life Time Value (LTV), also called Customer Lifetime Value (CLV), is an elusive concept that has strong intuitive appeal to marketers. Future income streams are discounted and summed to calculate the total value of the customer over the course of a "lifetime". For marketers it may be intuitive to treat customers as assets. LTV (CLV) provides a quantitative "bridge" to link investment in marketing and customers to profit and ultimately market capitalization. It helps make ROI on marketing expenses accountable and quantifiable.



Balanced Scorecards

Balanced Scorecards (BSc) were an "invention" of Kaplan and Norton in the 90's and grew out of the realization that driving a business on financials only has serious limitations and may well hinder future growth. "The Balanced Scorecard translates an organization's mission and strategy into a comprehensive set of performance measures that provides the framework for a strategic measurement and management system" (Kaplan & Norton, p. 2).



Survey Sampling

Sampling applies whenever we choose to draw inferences from a limited number of observations rather than the entire “population.” There can be many reasons to use sampling like not having all data available, or prohibitive costs for acquiring all data. Sometimes all the data are there (as in a census), but are too unwieldy to handle.



Agile Software Development

Agile software development methods are a ‘family’ of approaches sharing common traits that distinguish it from traditional “waterfall” methods. Waterfall methods begin development by extensive requirements gathering and documentation, and getting “sign-off” on detailed, explicit specifications. “Agile” methods (Scrum, XP, and DSDM being the best known, probably) on the other hand, acknowledge the insurmountable challenge of ever getting the specs exactly right. They focus on leveraging creative input from developers and minimizing unnecessary paperwork.



ETL

ETL stands for Extract, Transform, Load. This process merges and integrates information from source systems in the data warehouse (DWH). Authorities like Gartner, Kimball or Inmon warn us that 60-90% of DWH development budgets are typically spent on ETL. Major project risks reside here, in particular when sources prove unreliable and poorly documented.



Neural Networks

Neural Networks (NNs) are sometimes considered the epitome of data mining algorithms. Loosely modeled after the human brain (hence the name), NNs "learn" patterns from repeated exposure to data in much the same way our brain learns to recognize patterns in the outside world. That is how NNs identify fraudulent transactions, high potential prospects, etc. They have been used to improve manufacturing process control, predict exchange and interest rate fluctuations, predict utility usage, and many other applications.



Corporate Strategy

Strategy is one of those elusive concepts that everybody "knows" but few people can define. Freedman: " strategy is the framework of choices that determine the nature and direction of an organization. The choices in the framework relate to what products and services will be offered and not offered, what markets will be served and not served, and what capabilities are needed to take products to markets."



Missing Data

Missing data are a fact of life. No matter how hard we try, and how careful we assemble our data sets, there will always be missing data. In fact, sometimes data are supposed to be missing, for instance because a particular attribute does not apply to a person. In some cases, the pattern in missing data can be equally informative as the information present. In general, however, the effect of missing data is to limit the amount and quality of available information.



Segmentation

Segmentation refers to the process of cutting up a heterogeneous population in chunks that themselves are considered to be more or less homogenous. The purpose is to identify subgroups who display similar behaviors and have similar needs. This makes the market more transparent and allows for a differentiated strategy per segment.



Decision Trees

Decision trees are some of the most flexible, intuitive and powerful data analytic tools for exploring complex data structures. Because decision trees can be used for both prediction as well as insight, any data miner can gain from applying them in diverse projects.



XBRL

XBRL, an acronym for eXtensible Business Reporting Language, will permanently transform the creation, exchange and comparison of financial information.

XBRL is an extension of XML (eXtensible Markup Language) and was ‘invented’ by Charles Hoffman, CPA, in April 1998. The first official specifications for XML were released in February 1998 by the World Wide Web Consortium (W3C).

Although at the moment mainly used for exchange of financial information, it offers the possibility to break down “language barriers” for any kind of business data exchange.



OLAP

OLAP, short for On-Line Analytical Processing, performs a unique function in between SQL and spreadsheet functionality. There are four core requirements for which neither SQL nor spreadsheets are fully adequate. These are support for:

  • multiple dimensions (alas, doable with SQL)
  • hierarchies (technically possible with SQL but very cumbersome)
  • dimensional calculations, and
  • separation of structure and representation


Data Quality Assessment

Data quality (DQ) assessment is as much about assessing data as it is about the impact data quality (or lack thereof) has on business processes. The business case for DQ comes from documenting how data flaws hamper the business. In this information age data is considered an asset that should be managed and leveraged just like any other tangible asset. DQ assessment is a part of auditing to ensure responsible corporate governance.



Dashboards and Scorecards

Dashboards and scorecards are where strategy, corporate performance management (CPM) and business intelligence (BI) come together. When implemented properly, they communicate how executing strategy should become manifest. They display results and progress, enabling management by objective. The metrics should translate an organization's strategy into observable outcomes, and allow performance to be confronted with goals. This is where the strategy rubber meets the road.



Data Mining for CRM

Customer Relationship Management (CRM) was an over-hyped term that has fallen from grace. Nonetheless, principles for managing the value of a portfolio of customers remain equally valid as before. Data mining can serve many roles by supporting fact based decision making to optimize customer relationships.



Data Mining Algorithms

Data mining algorithms come in many shapes and forms. Because the profession is so young, there is no agreed upon comprehensive algorithm taxonomy, yet. One distinction everybody agrees on, however, is supervised versus unsupervised algorithms. Most new data mining algorithms are developed in the Machine Learning community.



Data Preparation

Data Preparation appears to be where data miners spend most of their time. Some say that 80%-90% of time in an average data mining project is spent “merely” preparing the data. And this is time well spent to end up with a good predictive model, yet data preparation and feature extraction are underrepresented in the data mining literature.



Campaign Optimisation

Campaign optimization can take place at three levels:

  • Finding the best possible targeting model for a campaign
  • Finding the optimal target group(s) within a campaign, possibly contacting the customer by one or more channels
  • Determining the best possible offer across multiple simultaneous campaigns


Affinity Analysis

Affinity analysis is an association technique based on the premise that the products consumers purchase, and the preferences they express, are indicative of their future behavior. By identifying product affinity patterns, one can predict future behavior to enhance service and promote cross-selling.



Vendor Selection

Vendor selection is a crucial skill in the recent trend towards outsourcing and offshoring. Outsourcing has been growing from 1% in 2003 to 9% in 2007 (Meta Group). Offshoring was used by 26% of all institutions in 2003, and has grown to over 70% (Deloitte). This has included both a move to low-wage countries, as well as divesting non-core activities.



System Dynamics

System dynamics are characterized by the fact that adding up intentions and actions of the constituent parts is not enough to explain an entire system's behavior. Such systems are also often called non-linear, complex systems and one of their characteristics is extreme sensitivity to initial conditions. The metaphorical example is a butterfly in India flapping its wings that precipitates a hurricane in America.



Credit Scoring

Credit scoring has dramatically changed the face of the underwriting business. It is a “young” discipline. Only 30 years ago, the majority of credit acceptance decisions were taken intuitively by underwriters. As statistical evidence accumulated, and early adopters of automated techniques conquered the markets, a sea change took place. Nowadays, almost all credit decisions are processed automatically, using scorecards to determine default (and sometimes profitability) odds.



Forecasting

Forecasting is the “art” of predicting the future. Given prevailing conditions, and within bounds that can and will be influenced, an estimate of future demand is derived. Sales targets should follow forecasts, never the other way around. They take into account what attainable conditions need to be created to succeed.



Web Usage Analysis

Web Usage Analysis is one of the frontiers in data analysis. Because every mouse click gets recorded, every page viewed, but also when, you can closely watch in minute detail every step a person takes on the web. At the moment, we are grossly lacking the conceptual models to fully exploit the richness these data might offer. Not only which pages are viewed in conjunction, but also the chosen navigation paths can offer tremendous insight.



Customer Profitability

Measuring and understanding customer profitability at the individual level enables a firm to appreciate the distribution of relationship value so it can allocate resources accordingly. Valuation of a firm equals the aggregate value of its customer relationships. Hence, the search for shareholder value is akin to managing a portfolio of customers.



Problem Analysis

Problems manifests themselves as the discrepancy between an “as is” state, and an “as it should be” desire. The answer to: “what’s the problem?” should be complemented with “Who has a problem?”, and “Why is that a problem?” to further understanding. Different people suffer in different ways from the same “situation”. For every person or party, the answer must be found what the essence of their problem is. This will often lead to many facets to the same “problem”.


Customer Satisfaction & Loyalty

Customer Satisfaction is an important driver of business success because it embodies value creation for the customer. The assumption is that satisfaction results in repeat business, and as positive experiences accumulate, the relation will strengthen. This will immunize customers from alternative offers, escaping a competition on price.


IT Governance

IT Governance is about encouraging and leveraging creative powers throughout the enterprise, while at the same time ensuring compliance with the company's strategic direction and policies. This conundrum can be resolved by installing the appropriate decision structures. Good IT Governance simultaneously empowers and controls. In short, IT Governance keeps resources productive and aligned.


Market Research

Market research can deal with both people's behavior as well as their attitudes or opinions. Research methods can be organized from subjective (e.g. self-report questionnaires) to objective (observation or behavioral data).


Search Engines

Search engines are the window to the web. In May 2004 there were 50 Million websites, in October 2006 this doubled to 100 Million websites (Netcraft research). Google recently counted 8 Billion unique pages. Given some 6,5 Billion searches per month, it becomes clear how important search engines are to organize and get access to these oceans of data.


Marketing Accountability

Marketing Accountability refers to a fundamental new way in which we view marketing expenses. Whereas marketing budget used to be seen as an expense, nowadays it is seen rather as an investment in the relationship with customers. With this new perspective, marketing expenses have come under the same kind of scrutiny we place other investment decisions: what's the ROI?


CRM

CRM, Customer Relationship Management, is a business strategy. CRM gained interest at a time when customer centric marketing became the fashionable thing to do. It is an attempt for large corporations to mimic the customer intimacy that small scale suppliers can offer because they understand their customers’ needs.


Data Mining Models

What is a model? A model is a purposeful simplification of reality. Models can take on many forms. A built to scale look alike, a mathematical equation, a spreadsheet, or a person, a scene, and many other forms. In all cases, the model uses only part of reality, that’s why it’s a simplification. And in all cases, the way one reduces the complexity of real life, is chosen with a purpose. The purpose is to focus on particular characteristics, at the expense of losing extraneous detail.


Privacy

Privacy is one of those topics that nobody cares about until their own privacy is being violated. Privacy threats have been compared to George Orwell’s “1984” where a totalitarian regime decimated individual freedom. Nowadays, the privacy threat doesn’t come from communist states but from capitalism, free markets, exchange of digital information and smart use of advanced technology.

Data Warehousing

Data Warehousing was an innovation from the 90's that promised to change the data landscape for good. How far have we come? Many vendors have entered the marketplace because it makes sense to bring together data from throughout the organization, and this will continue to make sense in the future.


Data Quality

Data quality gives a competitive edge.
Everybody agrees how important good data quality is. And everybody has been agonized by erroneous data. We've all lost a lot of time working with crappy data, and "Garbage In, Garbage Out" is probably the most commonly cited proverb in IT. Then how come it is always so hard to find volunteers to do something about it?

Contact
XLNT Consulting
Tom Breur, Principal

E-mail
Email Tom Breur

Telephone
+31-6-463 468 75

Address
Langestraat 8-03
5038 SE Tilburg
the Netherlands