Which SAS Enterprise Miner tool would you use to exclude extreme outliers from your analysis?

Chapter 1 Introduction1.1Introduction to SAS Enterprise Miner.........................................................................1-3

1.1Introduction to SAS Enterprise Miner1-31.1Introduction to SAS Enterprise Miner2SAS Enterprise MinerThe SAS Enterprise Miner interface simplifies many common tasks associated with applied analysis. Itoffers secure analysis management and provides a wide variety of tools with a consistent graphicalinterface. You can customize it by incorporating your choice of analysis methods and tools.3SAS Enterprise Miner–Interface TourMenu bar and shortcut buttonsThe interface window is divided into several functional components. Themenu barand correspondingshortcut buttonsperform the usual windows tasks, in addition to starting, stopping, and reviewinganalyses.

1-4Chapter 1Introduction4SAS Enterprise Miner–Interface TourProject panelTheProject panelmanages and views data sources, diagrams, results, and project users.5SAS Enterprise Miner–Interface TourProperties panelTheProperties panelenables you to view and edit the settings of data sources, diagrams, nodes, results,and users.

1.1Introduction to SAS Enterprise Miner1-56SAS Enterprise Miner–Interface TourHelp panelTheHelp paneldisplays a short description of the property that you select in the Properties panel.Extended help can be found in the Help Topics selection from the Help main menu.7SAS Enterprise Miner–Interface TourDiagram workspaceIn thediagram workspace, process flow diagrams are built, edited, and run. The workspace is whereyou graphically sequence the tools that you use to analyze your data and generate reports.

1-6Chapter 1Introduction8SAS Enterprise Miner–Interface TourProcess flowThe diagram workspace contains one or more process flows. Aprocess flowstarts with a data sourceand sequentially applies SAS Enterprise Miner tools to complete your analytic objective.9SAS Enterprise Miner–Interface TourNodeA process flow contains several nodes.Nodesare SAS Enterprise Miner tools connected by arrowsto show the direction of information flow in an analysis.

1.1Introduction to SAS Enterprise Miner1-710SAS Enterprise Miner–Interface TourSEMMA tools paletteThe SAS Enterprise Miner tools available to your analysis are contained in thetools palette. The toolspalette is arranged according to a process for data mining, SEMMA.SEMMA is an acronym for the following:SampleYou sample the data by creating one or more data tables. The samples should be large enoughto contain the significant information, but small enough to process.ExploreYou explore the data by searching for anticipated relationships, unanticipated trends,and anomalies in order to gain understanding and ideas.

Data manipulation is an important part of the data mining process. Filtering data and removing inaccurate or skewed variables can be important to ensure that accurate analysis is completed. SAS® Enterprise Miner™ includes two nodes created specifically for the purpose of removing variables.

This tip focuses on two nodes used for filtering and removing variables and how they can be used:

Drop Node
Filter Node

The Drop Node

The Drop Node can be used to remove any unnecessary variables from the Enterprise Miner data sets. Any of the following role types can be dropped from scored data sets:

Assess
Classification
Frequency
Hidden
Input
Rejected
Residual
Target.

The Drop Node can be used within decision trees to trim the size of the data sets and metadata during the tree analysis.

The Drop Node can be found within the ribbon under the Modify tab.

The Drop Node can be dragged on to a SAS Enterprise Miner diagram and joined using an arrow to direct the flow of the data through the system:

The Drop Node allows you to specify the variables that you wish to remove from the SAS data set. This method has the following options available. To view the options available for the Drop Node, click on the Drop Node in the diagram and the properties will be displayed within the left pane.

By default, the ‘Drop from Tables’ attribute is set to ‘No’. This indicates that any variables that are selected to be dropped will be removed from the exported metadata only. If this value is set to ‘Yes’ then this node will create data sets instead of views for the data specified.

Within the ‘Drop Selection Options’ you can choose the type of variables that you would like to drop from the data analysis. This includes the data types below:

Assess
Classification
Frequency
Hidden *
Input
Predict
Rejected *
Residual
Target
Other

* Variables that have a role of Hidden and Rejected are dropped by default within the data set.

Within the Baseball data set the following roles have been set. On running the default settings within the Drop Node, we would expect that the logSalary variable would be dropped from the data set.

To run the Drop Node, right-click on the last node in the sequence and select run. A green-tick demonstrates that the node has run successfully:

On running the flow with the default settings, the following output log shows that one interval variable was discovered that had a role of rejected. This variable was removed from the data set.

The Filter Node

The Filter Node enables you to apply a filter to the training data set in order to exclude outliers or other observations that you do not want to include in your data mining analysis. Outliers can greatly affect modelling results and, subsequently, the accuracy and reliability of trained models.

Within SAS Enterprise Miner, the Filter Node can be found in the ribbon within the Sample tab.

The Filter Node can be dragged on to a SAS Enterprise Miner diagram and joined using an arrow to direct the flow of the data through the system:

The Filter Node can be used to remove any missing values, use normalised values or to customise the filtering method that you would like for both class and interval variables.

The ‘Export Table’ options allows you to specify which table to export after training the data set. This value can be set to one of the following:

Filtered: The default option, this allows the filtered data to be passed through as a view for further processing.
Excluded: This passes through any filtered out data as a view for further processing.
All: This passes all of the data through as a view and creates an indicator variable to identify any filtered records.

The ‘Tables to Filter’ option allows you to specify if you would like just the training data set filtered or if you would like all data sets filtered.

The ‘Distribution Data Sets’ option allows you to specify if the data sets used for interactive filtering should be created a training time. These data sets are used for histograms and bar charts which you may use in further analysis.

Class variables, by default, are filtered by Rare Values (Percentage) with a minimum cutoff for percentage at 0.01%. This removes any class variables that are only discovered in less than 0.01% of the data. The default also keeps any normalised of missing class variable values.

Interval variables are filtered using Standard Deviations from the Mean, with missing values also being kept.

To run the Filter Node, right-click on the last node in the sequence and select run. A green-tick demonstrates that the node has run successfully:

Running the Filter Node using the default settings has allowed for 44 observations to be excluded for the training data set.

The class variables that have been removed are as below:

The limits that were used for the interval variables are also displayed in the results window:

Which sampling method is used by default in SAS Enterprise Miner interactive exploration Windows?

The SAS Enterprise Miner interactive decision tree data sampling algorithm performs a random sample of the input training and validation data by default. This sampling is automatic and does not require user input for activation.

What is SAS Enterprise Miner?

SAS® Enterprise Miner. ™ Streamline the data mining process and create predictive and descriptive models based on analytics. SAS Enterprise Miner helps you analyze complex data, discover patterns and build models so you can more easily detect fraud, anticipate resource demands and minimize customer attrition.

What is SAS data mining tool?

SAS is a software suite that can mine, alter, manage and retrieve data from a variety of sources and perform statistical analysis on it. SAS provides a graphical point-and-click user interface for non-technical users and more through the SAS language.

Which type of node in Enterprise Miner is used to prevent overfitting?

The purpose of the Data Partition node is to partition or split the metadata sample into a training, validation, and test data set. The purpose of splitting the original source data set into separate data sets is to prevent overfitting and achieve good generalization to the statistical modeling design.

Which SAS Enterprise Miner tool would you use to exclude extreme outliers from your analysis?

The Drop Node

The Filter Node

Which sampling method is used by default in SAS Enterprise Miner interactive exploration Windows?

What is SAS Enterprise Miner?

What is SAS data mining tool?

Which type of node in Enterprise Miner is used to prevent overfitting?

zusammenhängende Posts

Which tool can be used to notify a technician where a cable break is occurring on a copper cable?

When a computer is the ____ of an attack, it is used as an active tool to conduct the attack.

Which of the following tools is a security scanner used to discover computers and services on a computer network?

Bei der Anmeldesitzung ist ein unerwarteter Fehler aufgetreten LoL 2022

Is another classic tool that is used to depict family linkages to its Suprasystems?

What type of symbol is typically used to represent the Internet on a network drawing?

Wie kann ich am besten meinen PC aufräumen?

Mini tool partition wizard 7 deutsch

Mcfarlans grid best describes applications that may be of future strategic importance?

Which of the following cannot be upgraded to windows server 2008 r2 enterprise edition?

Toplist

Neuester Beitrag

Stichworte