Select Page

On-boarding Data in Splunk

You are here:
Estimated reading time: 24 min

The most critical feature of Splunk, i.e., the addition of data to Splunk, will be outlined in this chapter. In Splunk 8.1, we will go over the newly introduced functionality of the IoT event collection’s JSON and REST API format, HTTP Event Collector. Then we will cover the different interfaces and on-board data choices on Splunk. We can also learn how to handle the segmentation of incidents and improvise the mechanism of data entry.

Deep diving into various input methods and sources

Splunk supports distinct strategies on its server to ingest information. Any data generated by a human-readable computer from various sources can be submitted using input data techniques such as directories, folders, and TCP/UDP commands. Besides, it can be linked to the Splunk Enterprise server, obtained from insights and analytics.

Data sources in Splunk

One of the core parts of visualization and analysis of data is the uploading of Splunk data. If data are not accurately parsed, timestamped, or split into incidents, interpreting and obtaining adequate insight into data will be challenging. Data from different realms in Splunk, such as IT protection, networking, mobile devices, telecoms networks, media and entertainment devices, storage devices, and so many others, can be analyzed and visualized. It is essential to parse the information in the right format to get the computer’s appropriate insight from different sources of various forms and styles.

The following screen capture illustrates the typical type of information provided by the embedded Support in the Splunk Enterprise, supported by machine-generated evidence of many types and frameworks. Most notably, if the data source is from the following list, the previously saved settings and settings are added in Splunk Enterprise. This helps to parse the data in the best and most effective formats of incidents and timestamps so that searches, analytics, and improved visualization are available more quickly.

Splunk Structured data

The data generated by the computer is usually structured and maybe semi-structured in some cases. Structuring data types include Comprehensive Markup Language (XML), the JavaScript Object Notation (JSON), CSV, TSV, and pipe separation values (PSV).

Splunk can upload any structured data format. When data arrives from either of the forms mentioned earlier, predefined configurations and setup can be explicitly enforced by selecting the respective source type during downloading of data or setting it up in the ‘inputs.conf’ file.

The preconfigured configurations are very generic with all of the previously structured files. System logs are often custom-structured logs; more designs are needed to parse the details in that situation.

There are different forms of XML, for instance. We mentioned here two kinds. There is a <note> tag of the first form at the beginning and </note> at the end. It comes with parameters and values in the meantime. Two layers of hierarchy occur in the second form. XML is labelled with the tag < library > and the tag < book >. We have parameters and meanings between the < book > and the < /book > tags.

The first kind is this:

<note>
<to>Test</to>
<from> TEST2</from>
<heading>XML Format TEST</heading>
<body>1st format of XML!</body>

Likewise, there can be many types of machine-generated custom XML files. Splunk Enterprise offers built-in settings and a setup specified for the source of various kinds of structured data. For, e.g., the obtained data from logs of a web server are organized logs and Might be in JSON, CSV, or a plain text format. So Splunk aims to make the user’s job more straightforward, based on the particular sources, by having the right configurations and settings for several different data sources.

Data from online servers, files, operating systems, network protection, and other software and utilities are commonly used data sources.

Web and cloud services

Both web services based on Linux are located on Apache, and all Web services based on Windows are located on Apache servers and IIS. The Linux logs created by web servers are straightforward, while Microsoft IIS log files are available either in a W3C-extended log file format or in the ODBC log file format.

According to the forwarded Splunk Business info, cloud resources can be linked and configured directly. App store with various technology add-ons for data production from cloud storage to Splunk Enterprise is included through the Splunk app store.

Thus, Splunk offers a preconfigured source sort for importing log files from web servers like Apache, which will search for data in the best format for viewing.

Suppose the user requires the Splunk Server to upload Apache error logs and then select apache error in the Root Page category as shown:

Splunk parsing
Splunk parsing

New configurations can be introduced according to our needs using the New Settings option found in the subsequent Settings section. After the modifications are made, you can either save the settings as a new source type or restore the current source type with new environments.

IT operations and network security

On the Splunk app store, Splunk Enterprise has several applications that target IT operations and network protection specifically. Splunk is a commonly recognized method for identifying, protecting network and content, fraud and theft, and reviewing and enforcing users’ behavior. A Splunk Enterprise program offers built-in Support for the firewall, Cisco SYSLOG, Call Information Records, and logs of the Cisco Adaptive Protection Appliance (ASA), Snort, a most common interference detection tool. The Splunk App store has several hardware add-ons for receiving data from different monitoring devices such as firewalls, routers, DMZ, etc. The program Splunk also offers visual observations and analysis of the data uploaded from security and IT devices.

Databases

Incorporated Support to databases such as MySQL, Oracle Syslog, and IBM DB2 is given for Splunk Enterprise. Also, code additions to the Oracle database and MySQL database can be found in the Splunk app store. These add-ons can be used to retrieve, pars, and upload the data to the Splunk Enterprise server from the respective client.

Various data types from one source can be available; let’s take an example of MySQL. Error log data, query tracking data, the health and status of MySQL server data, or MySQL data stored as databases and tables may exist. This concludes that a wide range of data from the same source can be produced. Splunk, therefore, supports all forms of sources paid data. We have a built-in setup for MySQL error logs, MySQL slow queries, and MySQL database logs, which are established to make data generated from each source easier.

Application and operating system data

In Splunk, the setup for Linux dmesg, Syslog, protection logs, and several different logs available in the Linux operating system has been integrated into the input source form. Splunk includes even configuration settings for data input of Windows and iOS logs and the Linux OS. It also offers traditional logging settings for Java, PHP, and. NET business applications based on Log4j. Splunk supports various other data applications, including Ruby on Rails, Catalina, WebSphere, etc.

Splunk Enterprise delivers a predefined configuration to enhance the data with improved parsing and case break-down, resulting in better interpretation of the available data for multiple programs, databases, OSes, and cloud and virtual environments. Alternatively, apps or add-ons are required in the app store for the programs whose settings are not available on Splunk Business.

Data input methods

Splunk Enterprise supports various approaches for data entry. Splunk can transmit data through files and folders, TCP, UDP, scripts, or universal forwarders.

splunk monior

Files and directories

Splunk Enterprise offers a simple file and directory interface for the uploaded files. The Splunk web interface will manually import files or set them to check for content updates, and if they are inserted in the file, the new data is submitted to Splunk. Splunk can be programmed to upload several files, either by uploading all the files in a single shot or by checking the directory for any new files and as the data arrives in the directory, they are indexed to Splunk. Splunk can upload any format of the data from any source in a readable human format, i.e., no proprietary tools are required for interpreting the data.

In addition to the compressed format of a compressed file (.zip and .. tar.gz), Splunk Enterprise supports the upload in a compressed format.

Network sources

To fetch Splunk data from network services, Splunk supports both TCP and UDP. It will track and then index every network port for input data on Splunk. In general, it is suggested to use a Universal forwarder in conjunction with data from network services to transfer data to Splunk, as Universal forwarder buffers information to prevent data loss when there are some difficulties in the Splunk server.

Windows data

Splunk Enterprise offers direct configurations for Windows device access to info. Both remote and local arrays of many types and sources from a Windows system are provided.

splunk files

Splunk includes predefined feedback and configurations for event recording, performance tracking reports, registry records, hosts, networks, and local or remote Windows device print monitoring.

Data can then be sent to Splunk from different sources from various mediums using other input methods, as needed and as necessary by the data and reference. Splunk software and technology add-ons found in the Splunk app store can also be used to create new data inputs.

Adding data to Splunk

Splunk Enterprise has developed new interfaces to embrace input that complies with minimal resources and lightweight devices for the Internet of Things. Version 8.1 from Splunk Enterprise supports Splunk Data Collector, REST, and JSON API.

HTTP Event Collector seems to be a convenient interface for sending data to the Splunk Enterprise server without using a forwarder from the current program. In .NET, Java, Python, and almost all programming languages, there are HTTP APIs. The transmission of data from your existing application based on a particular programming language is also a pastry.

Take an example, assume that you’re an Android device developer, and you want to know all the functionality that consumers use that are discomfort areas or screens that create issues. You would also like to see the application’s usage style. Then you can use the REST APIs for transferring logging data on the Splunk Enterprise server in the code of your Android program. The only thing relevant to remember is that the data must be sent to a JSON payload envelope. The benefit of using the HTTP Event Collector is that data can also be submitted to Splunk. We can quickly gain insights, metrics, and visualizations from Splunk without needing any third-party software or setup.

HTTP Event Collector and configuration

When you configure the HTTP Event Collector from the Splunk web console, and REST API is used to index the data sets from HTTP in Splunk.

HTTP Event Collector

HTTP Event Collector (HEC) includes an API with an endpoint to transmit log data to Splunk Enterprise from applications. Both HTTP and HTTPS are provided for stable links by Splunk HTTP event collector.

The following characteristics are given by the Splunk Enterprise data collection HTTP Event Collector:

  • The memory and resource usage are very lightweight and can be used even on limited resources to lightweight computers.
  • Without involving the setup or activation of forwarders, events can be transmitted from anywhere, including Web servers, smart devices, and IoT directly.
  • It is a JSON API based on tokens requiring no user credentials to be stored in the code or application settings. The tokens used in the API perform the authentication.
  • EC from the Splunk web console can be set quickly, HTTP EC is enabled, and the token is specified. You will be ready to embrace Splunk Business data after that. Both HTTP and HTTPS are supported and therefore protected. It supports GZIP and batch encoding.
  • HTTP EC can be strongly scaled to be used with a load balancer to crush and index millions of events per second in the distributed environment.

Configuration via Splunk Web

The measures for setting up HEC over the Splunk site are as follows:

  • Implement Collector of Events:
  • Open the Web console of Splunk and go to Settings, Inputs for details.
  • Tap HTTP Event Collector on the Data Inputs tab.
  • Click Global Settings on the HTTP Event Collector page in the upper right corner.
  • After this, the global settings edit page appears, and the following screenshot is similar:
splunk edit

Select Trigger for the All Tokens option

  1. Choose the respective source type based on the source from which the data comes and the type of data. The related setup and event parsing settings for the EC data are used by default when the specific source form is chosen.
  2. If you choose to customize HEC tokens using the deployment server, you must pick the Usage Deployment Server’s checkbox.
  3. Various other configurations can be modified accordingly from the Settings section, such as the index in which the data can be uploaded, whether the HTTP or HTTPS (SSL option) usage and the port number used an endpoint.
  4. Click on Save to apply the settings after adjusting the appropriate parameters.
  • New Token Creation. New Tokens can be generated from either Splunk Add Data or HTTP Event Collector tab, which modifies the Global Settings. The token can be set up in a new token.

The following steps are taken to generate a new token on the global configuration page:

  1. On the top right of the Global Settings tab, please click on the New Token button. This leads the user to the HTTP Event Collector options add data screen.
  2. This page requests Name, Override of Source Name, and Definition of the token, like the following:
splunk http

Enter a tag, an override of the source name, and a summary to describe the token. Users can also set the Output Group if any, and press Next.

  • The next page allows users the opportunity to select the type of index and source. If a new index or source type needs to be generated by the user, it can be created from this page. Click Next, after choosing an index and source type.
  • A summary page appears, where you can check all the configured inputs and settings for a new token and then press the Send button to build a new token.
  • The HTTP Event Collector token is created and displayed on the screen after clicking Submit. This token will now be used to forward data on Splunk in the HTTP API.

3. HTTP Event Collector Verification: Follow the sequence of steps given to verify Event Collector Verification:

  1. The following basic curl command can be used by developers to validate and check if the Event Collector and token are correctly configured:
splunk hec

The following response concludes that the events were successfully uploaded to Splunk in response to the preceding curl command:

{“text”: “Success”, “code”: 0}

  1. You can also verify the uploaded case by logging into the Splunk cloud server and using the search method in the chosen source form or index.

Managing the Event Collector token

You can change the Event Collector tokens from the Splunk Web Console by going to the Configuration menu and then selecting Data Inputs. Click on the HTTP Event Collector option on the Data Entry tab. This page lists all of the tokens we’ve made.

You can build, change, and delete the tokens from the Data Input portion. Click on the matching Edit token button that needs to be changed. Different parameters for the chosen tokens, such as Source type, Index, Output category, and others, can be changed. If any token is not in use or not needed at all for some time, then it may be disabled or removed as required.

The JSON API format

The data from the HTTP Event Collector has to be in a particular format that Splunk Enterprise recognizes for it to be correctly processed by Splunk. The Splunk HTTP Event Collector takes data sent in a sequence of JSON packets from separate sources. JSON packets consist of the metadata, the other of which is the information found in the event key. In a key-value format, the metadata has different parameters, while the event key has the real data in it.

The data sender is responsible for the packaging of JSON format data. Data can be bundled using either the Splunk log libraries available for Java and .NET or the Java Apache HTTP Client. You can write scripts or code to encode the data in the format defined in the sections below.

Authentication

The authentication is achieved with tokens by the Splunk Event Collector. Until it starts transmitting data to the Splunk server, the data source must first be authenticated and approved. Using a client-side authorization header, the authorization is completed. In the authorization header, every JSON data bundle carries the same unique token. It collects the data and sends a positive response to the sender when HTTP EC checks the token. The reaction appears such as this:

Authorization: Splunk xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Splunk Metadata

For any set of settings to be overridden by what is specified in the token settings, the following are recommended-value pairs that can be used in the event’s metadata:

  • Time: Using this key-value pair of information, event data may be timestamped. The time format defaults to epoch time in the format < seconds>.<milliseconds >.
  • Host: This key will specify the hostname of the source. To locate the computer from which the data originated, this key may be very helpful. In the IoT case, the data of the individual devices to which the hostname may be applied may be identified uniquely by various devices.
  • Source: This key is capable of defining the source name. This can be used along with the host to identify the data source in Splunk’s multiple-source implementation.
  • Source Type: This key is used to define the data type such that it is possible to do the respective parsing and event analysis as per the type of data or data source.
  • index: This is the index in which the event will be uploaded on Splunk

Note  

The previous keys are all discretionary. If the metadata is not specifically defined, default settings are added to the token.

The following is an example of metadata:

{
“time”: 1211140112,
“host”: “172.16.10.13”,
“source”: “host1”,
“sourcetype”: “csv”,
“index”: “IOP”
 }

Event data

Event data is aligned with the case key, including the real data for tracking and visualization that requires to be submitted to Splunk. Event data is expressed in JSON format as well:

An example of data about events is as follows:

“event”:

{
“7.00”: “32”,
“11.00”: “35”,
“15.00”: “33”,
“19.00”: “29”,
}

The following is a complete JSON packet with metadata and event data:

{ “time”: 1211140112, “host”: “172.16.10.13”, “source”: ” host1″, “sourcetype”: “csv”, “index”: “IOP”, “event”: { “7.00”: “32”, “11.00”: “35”, “15.00”: “33”, “19.00”: “29”, } }

When the previous JSON packet is received from a captured HTTP event, it parses the data and then sends it for event analysis to the indexers. It uses metadata where any particular collection of settings and configurations are added to the event obtained by the upload data before it is submitted to the indexer.

To promote the use of Splunk for IoT devices, Splunk released HTTP-based and JSON-based REST APIs. There are several routing and networking protocols and message brokers in IoT that are more frequently used, such as CAOP, MQTT, Apache KAFKA, JMS, AMQP Broker, and Amazon Kinesis, the cloud streaming protocol. For both of these protocols provided by HTTP Event Collectors that are ready to use, the Splunk app store provides flexible input technology add-ons. For uploading data on Splunk Enterprise, respective modular inputs may be used along with HTTP EC.

Splunk Data processing

Data analysis plays an important role in parsing and enriching data to generate insights quicker and visualize data with the necessary analytics. Data collection involves the configuration of the case, timestamp, and host.

Splunk Event configuration

Any data that is submitted to Splunk is considered an incident. An incident may be anything from log operation, error reports, user logs, computer, server, or other sources to machine-generated data. Events are used in the Splunk context to establish visualization and gain insight into the source. So, depending on the data and source, it is important to process the events correctly. The settings and configurations of the processed events can be stored in the source form later on.

Character encoding SPlunk

In pursuit of Splunk Enterprise internationalization, Splunk embraces several languages. On Splunk Enterprise, the default character set encoding is UTF-8, although it has built-in support for numerous other globally available encodings. If the data is not UTF-8 or includes non-ASCII data, then Splunk seeks to convert it to UTF-8 before the user in the Splunk configuration file decides that it should not be converted.

Splunk supports collections of different characters but only uses UTF-8 by default. If the data is of a different encoding type, the configuration is needed in the props.conf file. In props.conf, the following line causes the data uploaded from the MyDATA host to be parsed and use a Russian character set of encoding:

[host::MYData]
CHARSET=ISO-8859-5

Splunk also provides detection and classification of character set encoding. In a scenario where the data is submitted to Splunk to a specific source type, or a particular host includes a mixture of different character sets, Splunk’s efficient algorithm will automatically identify the character set’s encoding and add it to the data appropriately. To compel the source form to auto-encode instead of using the default UTF-8 character set, Props.conf needs to be updated with the following lines:

[host:: MYData]
CHARSET=AUTO

Event line breaking

A single occurrence can also consist of a few phrases, a single line, or several lines. The Splunk Business engine can track incidents automatically. However, as there are different data types and formats, it might not be sufficient for events to be adequately detected and broken down into possibilities. It can be manual line splitting needed if the automated line split does not sufficiently identify several line events.

It is possible to customize the event line split based on a standard expression, a specific word that happens at the beginning of any new event, a particular word that completes the event, when a new date or period is found, etc.

The following is a list of commands for event-line breaking that can be programmed from Splunk Web through the upload of data or can be installed in the file props.conf:

  • TRUNCATE=<NUMBER>: This order accepts a byte number after which the rows are to be truncated. The data is helpful if the information is in a long line and just up to a couple of individual bytes. Data that is not needed can be truncated using this instruction.

For instance, after 500 bytes of data, TRUNCATE=500 will truncate the line. So, it can truncate every string that has more than 500 bytes. Truncate is commonly used to eliminate memory leakage, search slowdowns, and to prevent useless data indexing.

  • LINE_BREAKER=<REGULAR_ EXPRESSION>: This command is used when a particular regular expression happens, to interrupt the case. The previous data is referred to as a new case whenever the specific common expression is found.
  • SHOULD_LINEMERGE = [true or false]: This command integrates several rows into one single row before any of the following set of attributes is satisfied:
  • BREAK_ONLY_BEFORE_DATE = [true or false]: Here, where a new line with a date is observed, the data is marked as a new case.
  • MUST_NOT_BREAK_AFTER = < REGULAR_ EXPRESSION >: Splunk doesn’t break for a specific regular expression until and until occurrences that fulfil the MUST BREAK AFTER condition are met.
  • BREAK_ONLY_BEFORE = < REGULAR_EXPRESSION >: When a particular regular expression is located in a new section, a new event is generated.
  • MAX_EVENTS = <NUMBER>: This number determines the maximum number of rows that can be used by a single occurrence.
  • MUST_BREAK_AFTER = < REGULAR_EXPRESSION >: Splunk generates a new event for the next input when a regular expression is defined on the given line.

The following is an example of the event breaking setup in the props.conf file:

[MyData]
SHOULD_LINEMERGE = true
MUST_BREAK_AFTER = </data>

The initial update to props.conf instructs the events to split after the MyDATA source type < /data > occurs.

Splunk Timestamp configuration

One of the most significant data parameters is a timestamp. It is convenient to build time visualization and perspective, i.e., the number of failures and crashes that happened in one day, in the last 10 minutes, in the previous month, and so on. A timestamp is needed to compare overtime data, build time-based visualizations, run searches, etc. The details we upload from multiple sources may or may not have a timestamp on it. The data with a timestamp should be parsed with the right timestamp format. Splunk automatically applies a timestamp during the upload for improved time-based visualizations for the information that does not have a timestamp.

The Splunk method for assigning a timestamp to the data is based on multiple conditions, such as the props.conf timestamp configuration. Suppose no settings are found in the props.conf file, the timestamp format of the source type is checked for events. If there is no timestamp for the case, it attempts to get a date from the source or filename. If the date cannot be identified, then it applies the present time to the case. In most instances, no special modification is needed, as Splunk scans for nearly all possible alternatives for the data to be allocated a timestamp.

In some instances, if the timestamp is not correctly parsed, the timestamp can be configured on the Source Type Settings page during data upload, or it can even be manually configured in the props.conf format.

Here are the features that can be specified for the timestamp setup in props.conf:

  • TIME_PREFIX = <REGULAR_EXPRESSION>: This enables one to check for a particular regular expression prefixed with the timestamp. For instance, if a timestamp is accessible after the < FORECAST > tag in your data, then consider it as TIME PREFIX.
  • MAX_DAYS_ HENCE =<NUMBER>: Import data with a date less than the number of days in the future from these settings. E.g., the default value for this attribute is 2, and so it will be skipped and not uploaded from the current day when the data has a date greater than two days.
  • MAX_TIMESTAMP _LOOKAHEAD = <NUMBER>: We define the location number in this attribute for the case where the timestamp is placed. Let’s say we have configured every line to interrupt the events and a timestamp is found for every 15 sentences, so MAX-TIMESTAMP-LOOKAHEAD=15 should be configured.
  • TIME_FORMAT = <STRPTIME_FORMAT>: timestamp strptime format.
  • TZ = <TIMEZONE>: You should determine the time zone in the (+5:30) format or the UTC format. TZ=+5:30 defines, for instance, India’s time zone.
  • MAX_DAYS_AGO = <NUMBER>: If you don’t want to upload older data to Splunk, these setup parameters may be quite helpful. If you are involved in uploading data for the last month only, you can customize this attribute and any data earlier than the specified days will not be submitted to Splunk. This parameter’s default value is 2000 days.

Let’s look at an instance of the extraction of a timestamp.

After the term Prediction, the following configurations in props.conf will look for a timestamp. The timestamp will be parsed in the following Asia / Kolkata format and time zone, which is +5:30, and it will not be allowed to upload data that is over one month old:

[MyData]
TIME_PREFIX = FORECAST:
TIME_FORMAT = %b %d %H:%M:%S %Z%z %Y
MAX_DAYS_AGO = 30
TZ= Asia/Kolkata

Host configuration

The name used to describe the root of the upload of the data to Splunk is the hostname or host. It’s a default sector, and Splunk assigns a value to all the Splunk info. The default host value is normally the host, IP, or file path or a number of the TCP / UDP port that generates the data.

Let’s provide an example, where we have uploaded data from four separate Delhi, Jaipur, Host1, and Bangalore web servers. All data is obtained from database servers, so it is accessible through the same form of source. It is impossible to obtain an insight into only one single location in such circumstances. This way, the hostname can upload the data to identify the source and produce views and observations unique to the source. In this case, a hostname allocated for that particular web server position will act as a filter for that source’s information set when a user is interested in discovering the number of crashes, server downtime, and other observations only for one web server.

As stated previously, Splunk automatically attempts to add a hostname to the transforms.conf configure or specify a source type in data processing configuration, whether not previously defined or set by the user. A manual hostname setup can be needed to provide better analysis and visualization in certain cases.

From inputs.conf, the default host value may be set as follows:

[default]
host = <string>

Keeping a host as < string > sets Splunk to retain the source’s IP or domain name as host.

Tip

Do not place quotations (“) in the value of the host. E.g., it’s true for host = Host1, but the host=”Host1” means that a host in the .conf file is incorrect to allocate.

In a widely distributed system, data may be uploaded through a transmitter or a directory path. Depending on the directory in which the data is identified or dependent on incidents, it could be appropriate to allocate the hostname. Splunk can be designed to navigate certain complex situations where a directory structure or a data occurrence can allocate the host’s name statically or dynamically.

Configuring a static host value – files and directories

This approach is helpful where a single host attribute is applied to the data obtained from a particular file or directory. A single value for data from a given file or directory must be specified in the following method.

Let’s see the method of the Web Console:

  1. Navigate Data Input Splunk web Console Files and Folders.
  2. To customize a host attribute, choose the respective input for upgrading or generate a new input if the current input settings are to be added.
  3. Choose the constant value under the Set Host Drop-down menu and type the hostname you want to set for the input source into the host value filed text box.
  4. To apply the parameters, press Save / Send.

Now Splunk can make sure that the specified host value is allocated to any data imported from the configured data entry.

Now see the method for Config File. Static host values can also be manually modified by altering the inputs.conf file:

[monitor://]
host = <hostname>

In the current data entered, the host value will be substituted to ensure that the host value is allocated to all data uploaded in the future. In the absence of the input process, the input can be structured in the same manner as the previous entry with the file/directory path from which it is uploaded.

This is a case in point. In inputs.conf, the following configurations ensure that any data uploaded from the F drive’s data folder has a Test Host value:

[monitor://F:\Data]
host = TestHost

Configuring a dynamic host value – files and directories

This setup is helpful if we depend on the file name or a standard filename.

This configuration is useful when we are dependent on the name of the file or a regular

Configuring a dynamic host value – files and directories

This helps us rely on the file name or a regular source expression to discriminate between different hosts’ data. This is generally helpful if archived data is imported from Splunk. The filename has any host metadata, or if a single forwarder fetches data from a variety of sources and uploads this to Splunk.

Suppose that Splunk uploads the data from the following folders:

F:\Data\Ver\4.4
F:\Data\Ver\4.2
F:\Data\Ver\5.1

If the Kitkat host is the 4.4 folder uploaded for the previous scenario, Jellybean is the 4.2 folder, and Lollipop is the 5.1 folders. A dynamic configuration of the host value is required.

Following are the steps for the Web Console strategy:

  1. Navigate Data Input Splunk Site Console Files and Folders.
  2. To customize a host attribute, choose the respective input for upgrading or generate a new input if the current input settings are to be added.
  3. The following choices can be found under the Set Host menu bar:
  4. Regex on Path: If the hostname is retrieved from a standard path expression, this alternative may be picked.

The above illustration can be applied using this technique by setting the standard textbox expression as F:\Data\Ver\(\w+).

  • Path Segment: this alternative is useful when using the path segment as a host value.

You may also execute the previous example by selecting it and setting the textbox segment number to be 4, i.e., F:\Data\Ver\4.4; 4.4 is the fourth segment of the path case.

  1. To apply the parameters, press Save / Send.

By updating the inputs.conf file the following may be used to customize complex host values manually by the Config File process. Input.conf looks like this for the previous example:

Regex on Path:

[monitor://F:\Data\Ver]
host_regex =F:\Data\Ver\(\w+)

OR

Segment in Path:

[monitor://F:\Data\Ver]
host_segment = 4

Configuring a host value – events

Splunk Enterprise facilitates the assignment of multiple hostnames based on data events. When forward forwarder data or the data is from the same file/directory where files/directories cannot be used to distinguish hostname information, the event-based host configuration plays a very important role. Event-based host settings can be set by changing the configuration files that are looked at later on.

These parameters should be fixed for the Transform.conf file and have a special name to update the props.conf file:

[<name>]
REGEX = <regex>
FORMAT = host::$1
DEST_KEY = MetaData:Host

In addition to Transforms.conf, the props.conf file has to be configured accordingly. It has a source or form of source. In transforms.conf, the TRANSFORMS parameter may have the same name.

The code block should look as follows for Props.conf:

[<source or sourcetype>]
TRANSFORMS-<class> = <name>

We have data imported from test.txt to Splunk Enterprise, which contains events from various sources. The user has to delegate different hosts depending on the event content when uploading them to Splunk.

To execute this initialization host (the host value of related events), the files transform.conf and props.conf are configured in:

//For Transforms.conf file

[EventHostTest]
REGEX = Event\soriginator:\s(\w+\-?\w+)
FORMAT = host::$1
DEST_KEY = MetaData:Host

//For Props.conf file

[TestTXTUpload]
TRANSFORMS-test= EventHostTest

The previous setup guarantees that a host is selected depending on the regular expression found.

Until Splunk uploads the details, all host configuration methods explained previously must be enforced. In situations where data is already uploaded on Splunk, hosting can be removed and reindexed, or tags for incorrect host values can be generated. In case the data is already submitted, a common approach may also be used: the use of search tables.

Managing event segmentation

In events, Splunk splits the data uploaded. Splunk search events are the main items that are segmented into an index and search time further. Events, In essence, the segmentation of incidents into smaller, broad, and minor units is graded. With the aid of the following example, segmentation can be explained.

The full IP address is an important segment, and a broad segment can be split into several small segments.

The search time segmentation impacts the search speed and the ability to create searches based on search results on the Splunk site. According to the criteria, various segmentation forms can be modified as index time segmentation affects storage space and indexing speed. Splunk also offers the ability to extend the segmentation of events to a single host, source, or source type.

Three types of case segmentation can be set for index time and search time segmentation:

  • Inner segmentation: This segmentation type means that the events are divided into the smallest segments. Internal segmentation leads to quick indexing and browsing and less disk use but also results in a deterioration in search forecast lookout features when looking at the Splunk Web console.
  • Outer segmentation: external segmentation is the reverse of internal segmentation. Major divisions are not separated into small segments in this segmentation. It is less effective than internal segmentation, however, but more efficient than complete segmentation. This often restricts the potential for browsing on the Splunk Web Console to press on various pages of search results.
  • Full segmentation: Extensive segmentation is both an internal and an external combination. It maintains wide and small segments. This is said to be the least effective indexing option and is more searchable.

Splunk Enterprise is set up by default to the Indexing form that blends external and internal segmentation with complete Search Time Segmentation on $Splunk HOME/etc/systems/default in segmenters.conf.

If event segmentation is carried out on a particular server, form, or source, the props.conf may be configured. For the segmentation of the individual case, the following block can be applied to props.conf. The values in the SEGMENTATION attribute are the outer, inner, full, and none.

Index-time segmentation:

[Source/Sourcetype/Host]
SEGMENTATION = # SEGMENTATION_TYPE can be Inner, Outer, None or Full

For better clarity, refer the following example:

[TestTXTUpload]
SEGMENTATION = Outer

Search-time segmentation:

[Source/Sourcetype/Host]
SEGMENTATION- = # SEGMENTATION_TYPE can be Inner, Outer, None or Full & SEGMENT can be full, inner, outer or raw.

For example:

[TestTXTUpload]
SEGMENTATION-full = Outer

Improving the data input process

Before generating observations and perceptions of data, data input is a very critical process. Thus, the indexing, scanning, processing, and segmentation of the data is essential. It may not be that the first user approach/setting applies, and a trial and error procedure may be required if the right settings are to be sought for the types of data for which configurations in Splunk are not accessible by default.

It is often advisable to upload limited data to a test index on a Splunk production site. Suppose the data is accessible on Splunk as an event in the proper format, where queries can lead to the necessary visualization. In that case, the input can be transmitted on the production server to the correct index and source.

It occurs on several occasions when Splunk cannot index the file since the filename or file content is already on Splunk to prevent redundancies if you test and attempt and import the same file more than once to try out various configurations events setup. The index must be cleaned in those situations, or the index can be removed or disabled with the following fishbucket commands:

Splunk Cleaning an index : splunk clean eventdata -index
Splunk Deleting an index : splunk remove index
Splunk Disabling an index : splunk disable index

Whether a packet of data from a TCP or UDP stream is sent directly to Splunk, write the data to a file, and then configure Splunk to track it. This helps prevent data loss when Splunk or a network is down, and it can also be useful for a cause if Splunk is removed and reindexed. If the forwarder, TCP, UDP, or scripted data input is used, use persistent queues to buffer data. In case of complications, it allows us to store data in a queue.

When data is to be remotely uploaded to Splunk Enterprise, it is best to use Splunk forwarders. Advanced citizens will have a pulse transmitted to the indexer every 30 seconds that will retain the data until it has been linked again in the event of a communication failure.

In that case, the search time with a time mark can be deactivated when the data to be submitted on Splunk is not time marked, and Splunk is set to use the uploaded time as a time mark. The deactivation of time-tamp searches on data without a timestamp greatly improves and makes processing even more rapid. The block [host:: MyDATA] with the DATETIME CONFIG attribute as NONE is appended to deactivate timestamp checking inputs.conf.

For greater clarification, see the following example:

[host::MyData]
DATETIME_CONFIG = NONE

Data entry is a major and essential Splunk tool operation. Any of the points to remember during the input process are as follows:

  • Figure how and how to upload Splunk data in which input methods.
  • If required, use Universal forwarder.
  • Check the Splunk app store and, if necessary, use some technology add-ons.
  • Apply the Common Information Model (CIM), which describes Splunk’s usual host tags, fields, event type tags for much of the IT information processing
  • Test the download first and begin with the deployment server in a test environment

Splunk Documentation for more information

Was this article helpful?
Dislike 0
Views: 95

Continue reading

Next: What is Splunk fishbucket