Which SAS Enterprise Miner tool would you use to exclude extreme outliers from your analysis?

Chapter 1 Introduction1.1Introduction to SAS Enterprise Miner.........................................................................1-3

1.1Introduction to SAS Enterprise Miner1-31.1Introduction to SAS Enterprise Miner2SAS Enterprise MinerThe SAS Enterprise Miner interface simplifies many common tasks associated with applied analysis. Itoffers secure analysis management and provides a wide variety of tools with a consistent graphicalinterface. You can customize it by incorporating your choice of analysis methods and tools.3SAS Enterprise MinerInterface TourMenu bar and shortcut buttonsThe interface window is divided into several functional components. Themenu barand correspondingshortcut buttonsperform the usual windows tasks, in addition to starting, stopping, and reviewinganalyses.

1-4Chapter 1Introduction4SAS Enterprise MinerInterface TourProject panelTheProject panelmanages and views data sources, diagrams, results, and project users.5SAS Enterprise MinerInterface TourProperties panelTheProperties panelenables you to view and edit the settings of data sources, diagrams, nodes, results,and users.

1.1Introduction to SAS Enterprise Miner1-56SAS Enterprise MinerInterface TourHelp panelTheHelp paneldisplays a short description of the property that you select in the Properties panel.Extended help can be found in the Help Topics selection from the Help main menu.7SAS Enterprise MinerInterface TourDiagram workspaceIn thediagram workspace, process flow diagrams are built, edited, and run. The workspace is whereyou graphically sequence the tools that you use to analyze your data and generate reports.

1-6Chapter 1Introduction8SAS Enterprise MinerInterface TourProcess flowThe diagram workspace contains one or more process flows. Aprocess flowstarts with a data sourceand sequentially applies SAS Enterprise Miner tools to complete your analytic objective.9SAS Enterprise MinerInterface TourNodeA process flow contains several nodes.Nodesare SAS Enterprise Miner tools connected by arrowsto show the direction of information flow in an analysis.

1.1Introduction to SAS Enterprise Miner1-710SAS Enterprise MinerInterface TourSEMMA tools paletteThe SAS Enterprise Miner tools available to your analysis are contained in thetools palette. The toolspalette is arranged according to a process for data mining, SEMMA.SEMMA is an acronym for the following:SampleYou sample the data by creating one or more data tables. The samples should be large enoughto contain the significant information, but small enough to process.ExploreYou explore the data by searching for anticipated relationships, unanticipated trends,and anomalies in order to gain understanding and ideas.

Data manipulation is an important part of the data mining process. Filtering data and removing inaccurate or skewed variables can be important to ensure that accurate analysis is completed. SAS® Enterprise Miner™ includes two nodes created specifically for the purpose of removing variables.

This tip focuses on two nodes used for filtering and removing variables and how they can be used:

  • Drop Node
  • Filter Node

The Drop Node

The Drop Node can be used to remove any unnecessary variables from the Enterprise Miner data sets. Any of the following role types can be dropped from scored data sets:

  • Assess
  • Classification
  • Frequency
  • Hidden
  • Input
  • Rejected
  • Residual
  • Target.

The Drop Node can be used within decision trees to trim the size of the data sets and metadata during the tree analysis.

The Drop Node can be found within the ribbon under the Modify tab.

The Drop Node can be dragged on to a SAS Enterprise Miner diagram and joined using an arrow to direct the flow of the data through the system:

The Drop Node allows you to specify the variables that you wish to remove from the SAS data set. This method has the following options available. To view the options available for the Drop Node, click on the Drop Node in the diagram and the properties will be displayed within the left pane.

By default, the ‘Drop from Tables’ attribute is set to ‘No’. This indicates that any variables that are selected to be dropped will be removed from the exported metadata only. If this value is set to ‘Yes’ then this node will create data sets instead of views for the data specified.

Within the ‘Drop Selection Options’ you can choose the type of variables that you would like to drop from the data analysis. This includes the data types below:

  • Assess
  • Classification
  • Frequency
  • Hidden *
  • Input
  • Predict
  • Rejected *
  • Residual
  • Target
  • Other

* Variables that have a role of Hidden and Rejected are dropped by default within the data set.

Within the Baseball data set the following roles have been set. On running the default settings within the Drop Node, we would expect that the logSalary variable would be dropped from the data set.

To run the Drop Node, right-click on the last node in the sequence and select run. A green-tick demonstrates that the node has run successfully:

On running the flow with the default settings, the following output log shows that one interval variable was discovered that had a role of rejected. This variable was removed from the data set.

The Filter Node

The Filter Node enables you to apply a filter to the training data set in order to exclude outliers or other observations that you do not want to include in your data mining analysis. Outliers can greatly affect modelling results and, subsequently, the accuracy and reliability of trained models.

Within SAS Enterprise Miner, the Filter Node can be found in the ribbon within the Sample tab.

The Filter Node can be dragged on to a SAS Enterprise Miner diagram and joined using an arrow to direct the flow of the data through the system:

The Filter Node can be used to remove any missing values, use normalised values or to customise the filtering method that you would like for both class and interval variables. 

The ‘Export Table’ options allows you to specify which table to export after training the data set. This value can be set to one of the following:

  • Filtered: The default option, this allows the filtered data to be passed through as a view for further processing.
  • Excluded: This passes through any filtered out data as a view for further processing.
  • All: This passes all of the data through as a view and creates an indicator variable to identify any filtered records.

The ‘Tables to Filter’ option allows you to specify if you would like just the training data set filtered or if you would like all data sets filtered.

The ‘Distribution Data Sets’ option allows you to specify if the data sets used for interactive filtering should be created a training time. These data sets are used for histograms and bar charts which you may use in further analysis.

Class variables, by default, are filtered by Rare Values (Percentage) with a minimum cutoff for percentage at 0.01%. This removes any class variables that are only discovered in less than 0.01% of the data. The default also keeps any normalised of missing class variable values.

Interval variables are filtered using Standard Deviations from the Mean, with missing values also being kept.

To run the Filter Node, right-click on the last node in the sequence and select run. A green-tick demonstrates that the node has run successfully:

Running the Filter Node using the default settings has allowed for 44 observations to be excluded for the training data set.

The class variables that have been removed are as below:

The limits that were used for the interval variables are also displayed in the results window:

Which sampling method is used by default in SAS Enterprise Miner interactive exploration Windows?

The SAS Enterprise Miner interactive decision tree data sampling algorithm performs a random sample of the input training and validation data by default. This sampling is automatic and does not require user input for activation.

What is SAS Enterprise Miner?

SAS® Enterprise Miner. ™ Streamline the data mining process and create predictive and descriptive models based on analytics. SAS Enterprise Miner helps you analyze complex data, discover patterns and build models so you can more easily detect fraud, anticipate resource demands and minimize customer attrition.

What is SAS data mining tool?

SAS is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it. SAS provides a graphical point-and-click user interface for non-technical users and more through the SAS language.

Which type of node in Enterprise Miner is used to prevent overfitting?

The purpose of the Data Partition node is to partition or split the metadata sample into a training, validation, and test data set. The purpose of splitting the original source data set into separate data sets is to prevent overfitting and achieve good generalization to the statistical modeling design.

Toplist

Neuester Beitrag

Stichworte