IT360 Best Practices - A guideline
This section discusses some of the best practices for IT360.For Performance Tuning aspects of IT360 Best Practices, please refer to the link: https://manageengineit360.wiki.zoho.com/IT360-Best-Practices-Guide.html
Meeting system requirements
Installing Central Server and Probes
- It is recommended to use a dedicated server to install the Central Server of IT360 [Enterprise Edition or MSP Edition]
- It is recommended to use a separate server for installing a Probe
- Install only one probe per server
- Central Server and Probes should not be installed on the same server
- However, in case you have to install the Central Server or a Probe in a Server box which has other applications running, it is advised to get in touch with our support team for a feasibility confirmation.
Using VMs
All editions of IT360 can be installed in a VM environment and it needs to meet the System Requirements as per point 1 above. You may use VMs when you wanto reduce the number of physical server / boxes.Be ready with the IT360 Prerequisites before Implementation
Network Speed / Connectivity
- when the IT360 installation [Professional Edition] and the monitored devices are remotely located
- when the Central Server and Probes or the Probes and the monitored devices are remotely located
Failover and MSSQL
If you plan to use the Failover option offered by IT360, it is recommended to use MSSQL as the backend database.Understanding Clients' Environment
IP Address clash
Security and Connectivity and Device reachability concerns in a typical MSP environment
Support Information File and Screen shots
Backup and Disaster Recovery
Scalability considerations: Time to use a new Probe, Time to use a new Central Server
- Number of network interfaces [devices monitoring] = 8000
- Number of server / applications instances = 500
- Number of router interfaces for netflow monitoring = 200
- Number of network interfaces [devices monitoring] = 40,000
- Number of server / applications instances = 2000
- Number of router interfaces for netflow monitoring = 1000
- Number of Probes per Central Server = 50
Ensuring Continuity
When your requirement is to ensure an uninterupted monitoring and managment of your network using IT360, it is good to use the Failover feature.Managing Users and Roles in IT360
IT360 has several modules - the Network Devices module, Netflow / Bandwidth module, the Apppilcations and Servers module and the ServiceDesk module. You may want to provide restricted access to these module to a number of Users in varying permutations. The Users and Roles feature helps you do this and it is advised to understand this feature before configuration of the same. Please refer to the following link for details:User Synchronisation frequency from Active Directory
Other best practices
Some of the other areas where you could do some fine tuning / optimisation / follow best practices are listed here:- Log File configuration
- Database management
- Heap Size
- Cache...
- Polling Interval - Performance Data Collection interval and Availability Checks frequency
- Data Retention / Archiving
- It is suggested NOT to add all the licensed devices at a time. Instead, you may add, say about 20 to 30 devices at a time and check for the functioning of the tool and then move on to add another set of devices / applications
FAQ and Troubleshooting
It is suggested that you strictly follow the hardware and software / OS requirements as outlined in the following link and thereby avoid product hanging / crashing issues.
To ensure a smooth user and functional and implementation experience, we recommend you to follow and comply with the prerequisites outlined in our help document.
These mostly relate to
It is a good practice to keep a constant watch on the network / bandwidth speed with respect to the following scenarios:
If you are an MSP, offering services using IT360, it is suggested to understand your client's environment and discuss the same with our support team so as to assist you with handling any typical situations, very specific to this client.
This happens typically in a multi-customer, MSP environment that uses a shared IT360 Probe. When you have two machines of same IP addresses from a single customer or different customers, one of the machines will have to be monitored through a different Probe.
You may use the NAT support offered in IT360 when communication / reachability between the Central Server and Probes [or with the devices] needs to happen over the internet. This feature is particulary useful for MSPs
Whenever you want to report a problem to our support team, it is a good practice to send us a screen shot and also the log files. Also, ensure to tell us the scenario / environment / situation clearly, when you face a problem. You may send us the complete Support Information File that is automatically generated by the click of a hyper link [the Support -- Create Support File link on the top right corner of IT360 UI]. You may email us the same. Where the log files and screen shots are huge, of the order of several MBs, we suggest you to ftp / upload them to one of our public servers, the details of which are available in the above mentioned Support page.
It is a good practice to take periodic backup of the databases and the configuration files. For details, please refer to the following link:
It is advised not to overload a Probe or a Central Server beyond its recommended scalability.
It is recommended to install / use a new probe when it crosses the following limits:
It is recommended to use a new Central Server, when the load on it crosses the following limits [sum of all the probes put-together]:
Please note thar the load on the server also depends on the Polling interval, etc., and hence it is advisable to take the help of our Support team on arriving at the final numbers.
Frequency of User synchronisation or importing Users from Active Directory should not be less than 1 day. You may configure this Admin -- General -- Active Directory -- Import Users from Active Directory
Please refer to the following sections of the Help Documentation for FAQ and Troubleshooting [these could also be a source of best practices tips]:Troubleshooting: http://www.manageengine.com/it360/help/meitms/troubleshoot.html
Applications and Servers - Best Practices
Using Downtime Scheduler
One way of improving the performance of the Applications and Servers module of IT360 is to use the Downtime Scheduler feature, for such of those servers for which you may want to have a scheduled maintenance: the Data Collection Threads are freed during the scheduled downtimes. These threads can be utilised by other monitors. [Note: Using the Downtime Scheduler feature also avoids the generation of alert notifications during the planned maintenance]Polling Interval
IT360 offers two distinct features related to Polling:- Polling for Health / Availability Check which can be set to say, 10 minutes.[this is less resource intensive and hence can be more frequent]
- Polling for Data Collection / Performance Polling which is less frequent, say, you collect performance data after every 3 Health Check polls mentioned above. [this is more resource intensive and hence made less frequent]
Fault Management
Log file management
Socket TimeoutOne way of reducing the false alarms is to increase the socket timeout option by changing the am.sockettest.timeout value present in <IT360_Home>/applications/conf/AMServer.properties. You may change the value from 5 to say, 10.Network Availability checkWhen you use IT360 for network device monitoring, alarms are generated when the url is not available / down. However, you may also see alarms when IT360 is out of network or if the proxy settings are not configured. To avoid such false alarms, you may use the Network Availabilty Check option where you specify a dependable host which is unlikely to go unreachable / go down. Before reporting a monitored resource as down, IT360 checks for the availability of this specified resource and if it is down only then it reports the monitored devices as down.URL Availability CheckWhen you use IT360 for URL monitoring, alarms are generated when the url is not available / down. However, you may also see alarms when IT360 is out of network or if the proxy settings are not configured. To avoid such false alarms, you may use the URL Availabilty Check option where you specify a dependable URL which is unlikely to go down. Before reporting a monitored url as down, IT360 checks for the availability of this specified url and if it is down only then it reports the monitored url as down.Consecutive Polls checkYou may use this option to check the number of consecutive polls for which an error existed before actually reporting the error to the IT360 system for alarm notification, etc..You need to configure at Admin -- Servers and Applications -- Alarm Settings --Consecutive Polls Count
You may also specify the size of the log files at <IT360_Home>/applications/working/conf/logging_parameters.conf
Networks - Best Practices
Ping Timeout
When ping timeout happens, the devices are not discovered. You may increase the Timeout value to try and discover the devices. The Timeout value can be set in the following file:<IT360_Home>/networks/conf/ping.propertiesIdentifying the devices by their Host Names
If you want to identify the devices by their Host Names instead of their IP Addresses, you need to set the DNS Server address in the machine that hosts IT360.Creating new device templates and importing the new MIBs
It360 supports monitoring of several hundreds of devices across vendors. The corresponding Devices Templates and MIBs are already present in IT360, by default. However, if you want to monitor a New Device [which is outside of the Device Templates list], it is a good practice to create a New Device Template and import the relevant MIBs, well ahead of its monitoring by IT360. This ensures that the New Device is placed under the correct device category.
Performance Tuning aspects of IT360
For Database tuning, Heap size, thread cache, query cache settings, etc., please refer to the following link. This link contains complete details about the tuning aspects of IT360 for optimal performance.
IT360 BEST PRACTICES GUIDE
This document will give the ManageEngine™ IT360 best practices recommended for the optimum functioning.
This comprises of the following items,
- System Requirements
- Applications and Servers
- Networks
- Console
The software and hardware requirements for the various editions of IT360 is available in the below link.
http://www.manageengine.com/it360/help/meitms/setup-meit360/system_requirements.html
2. Applications and Servers
- Increase data collection threads : Applications & Server module use a separate thread for every data collection of monitor, the default number of threads used is controlled by the respective schedulers specified in the <IT360_Home>/applications/working/conf/threads.conf . The default number of threads used for URL Sequence Monitoring is 5 and other monitors are 12.When the number of monitors is greater than 100, you can increase the data collection threads (default is 10) by editing the 'Data Collection' attribute in the file<IT360_Home>/applications/working/conf/threads.confThe 'Data Collection' thread value can be increased to a value ranging between 20 and 25.
Similarly, when the number of URL Sequence monitors is large in the range of 100, you can increase the URL monitoring thread count by editing the 'URL Monitor' attribute to 20 – 25. However, the total number of threads including both URL monitor and other data collection threads must not exceed 40.Note: Increase in the 'Data Collection' thread will lead to slight increase in CPU usage. - Increase number of connections in database connection pool : When the number of monitors is greater than 100, connection pool can be increased by editing the NON_TRANS_CONNECTIONS field in <IT360_Home>/applications/working/conf/database_params.confThe default value of 6 can be set to 90 percent of the number of data collection threads.
Note: Increase in the 'Connection Pool’ will lead to slight increase in CPU usage. - Downtime Scheduler : When you do not need monitoring to happen during specific time period for some monitors, you can achieve this using the option ‘Downtime Scheduler’ available under ‘Admin’ tab. Monitors configured for ‘Downtime Scheduler’ does not use the data collection thread during the specified period and allows other monitors to utilize the data collection threads and hence improves performance.
- Poll Intervals : When performance polling is set only the availability and health check will happen at every polling event while other performance data will be collected only at the end of scheduled number of polls. This will reduce the load on the system when you want to monitor only health and availability. This option is available under ‘Admin’ -> ‘Performance Polling’ where you can set the number of polls before collecting performance data for server monitors. We recommend having a Poll Interval of 10 minutes for every monitor and a performance poll value of 3.
- Increase JVM heap size : Memory tuning can be achieved by editing the following parameters in "wrapper.conf" file available under the directory<IT360_Home>/applications/working/conf
- wrapper.java.initmemory – Initial Java Heap Size
- wrapper.java.maxmemory – Maximum Java Heap size (Optimum recommended value is 1024)
- Network Availability check : When the IT360 Server is out of the network or is not connected to the network, the status of all the Monitors that are currently been monitored will be shown as 'Down'. You can avoid this by enabling the 'Check for Network Availability' option. When this option is enabled, IT360 will generate alerts for the unavailability of resources only if the specified host is reachable in the network. For example, let us assume that the system/host which runs the IT360 has been isolated from the network. Enable this option and specify a hostname in the network (preferably not the hostname where IT360 runs). Now, IT360 tries to ping that machine for its availability in the network. If not available, alerts are not generated and resources are not shown as down. You can also specify the IP of your routers, gateways, etc., to check the system/host which runs the IT360 is present in the network.
- URL Availability check : When the IT360 is out of the network or if external proxy settings are not configured, the status of all the URLs that are currently been monitored will be shown as 'Down'. You can avoid this by enabling the 'Check URL Availability' option. When this option is enabled, IT360 will generate alerts for the unavailability of URL only if the other specified URL is down. For example, let us assume that the system/host which runs the IT360 has been isolated from the network. Enable this option and specify another URL which is expected to be up always. Now, IT360 tries to monitor URL for its availability. If not available, alerts are not generated and URL is not shown as down. Further a mail is sent to the configured mail address intimating the same.
- Consecutive polls check : You can use this option to determine the number of consecutive polls the error exists before reporting the error to system. Consecutive polls count in 'Admin' -> Action / Alert Settings can be increased from default value of 1 to 2. So that alerts will be generated after two consecutive polls which would eradicate false alarms.
- It is recommended to use SNMP or WMI mode for monitoring Windows machine and SSH or Telnet for monitoring UNIX based machines. SSH should be of more use as it provides much more security over Telnet.
- Alerting on Monitor Polling problem :
- Connect to the IT360 Probe webconsole. Go to the Admin --> Servers & Applications --> Action / Alarm Settings link.
- In that page, select/check the "Monitor Error Mail" attribute and set a proper value for "consecutive polls before sending error" .
- We would suggest to set the value for "consecutive polls before sending error" >= 5. This is to avoid flooding of e-mail alerts.
- The e-mail alert notification will be send to e-mail address of the 'admin' user.
- Before Discovery : Network module relies on other communication protocols SNMP, WMI, Telnet, and SSH for classification and monitoring. So make sure the following two configurations are done before triggering discovery,
- Configuring the relevant SNMP, WMI, and CLI credentials
- Defining Device Templates
- Configuring Discovery Parameters : IT360 Network module pings the devices for discovery and further for determining availability, and 4 ping packets are sent by default. If there is network latency, it is possible that some devices are not discovered, or post discovery, they are not polled for status. This can be addressed by configuring few ping parameters.
Steps to achieve this.- From <IT360-Home>/networks/conf folder open the file ping.properties.
- Un-comment (remove the # symbol) against the timeout parameter and specify the ping timeout depending on the latency.
- Similarly, you can increase the number of ping retries by configuring the value for retries parameter. Make sure you un-comment this parameter too for the configuration to be effected.
- Save the changes to the file.
- IT360 service requires a restart when changes are made to this file. So,restart IT360 for the changes to be effected.
Note: The above configuration is recommended only if there is latency. - Addressing SNMP Timeout Issue : The default SNMP query timeout to variables in a device is 5 seconds. If there is a delay in the agent response for some devices, you can globally increase the SNMP timeout as follows:
- From <IT360-Home>/networks/conf folder, open the file NmsProcessesBE.conf
- Look for the following default entry in this file:PROCESS com.adventnet.nms.poll.CollectorARGS POLL_OBJECTS_IN_MEMORY 25 POLL_JDBC true MAX_OIDS_IN_ONE_POLL 15 AUTHORIZATION true DATA_COLLECTION_QUERY_INTERVAL 120000 PASS_THRO_ALL_POLLING_OBJECTS true CLEAN_DATA_INTERVAL 999999
- Include the additional parameter DATA_COLLECTION_SNMP_TIMEOUT 15. Now the changed entry will be as shown below:PROCESS com.adventnet.nms.poll.CollectorARGS POLL_OBJECTS_IN_MEMORY 25 POLL_JDBC true MAX_OIDS_IN_ONE_POLL 15 AUTHORIZATION true DATA_COLLECTION_QUERY_INTERVAL 12000PASS_THRO_ALL_POLLING_OBJECTS true CLEAN_DATA_INTERVAL 999999 DATA_COLLECTION_SNMP_TIMEOUT 15
- Save the changes and restart IT360 Service.
- SNMP Data-collection : By default, IT360 uses 12 threads for SNMP polling. The assumption is that each monitored device has a minuimum of 10 polled data (monitored resources such as cpu, memory, incoming traffic, out-going traffic, errors etc). Each Interface object has 11 polleddata which include RxTraffic, TxTraffic,Bandwidth Utilization, Errors, Discards etc. Depending on the number of polleddata, you can increase the number of datapoll threads. Steps to achieve this is given below
- From <IT360-Home>/networks/conf folder, open the file threads.conf
- Increase the value of datapoll threads from 12 to the required number of threads for SNMP polling.
- Save changes and restart IT360 Service.
- Following is a reference table to increase the number of threads:
- Number of devices/interfacesNumber of datapoll ThreadsNumber of SNMP Polled DataMonitoring IntervalUpto 500 device/ 5000 interfaces12 (default)Upto 5000015 minsBeyond the above numbers13 - 20More than 50000: Additional 1 thread for every 5000 polleddata.15 min
- Database Connection Pool : If the number of PolledData is over 50000, the number of non-transaction connections can be increased in the range of 7 to 10 (default being 6 connections). Here is how you configure,
- From <IT360-Home>/networks/conf folder, open the file database_params.conf.
- Increase the value of NON_TRANS_CONNECTIONS parameter to the required number.
- Save changes and restart IT360 Service.
- Disabling Unnecessary Polling during scheduled maintenance : Whenever a maintenance is scheduled in the network for some devices, you can suspend polling for those devices by scheduling downtime in IT360 Admin --> Networks --> Downtime Scheduler. This prevents unnecessary requests to network resources resulting in false alerts. There will be improved performance as the devices covered in the scheduled do not use the data poll threads.
- Disabling polling for a category : From Admin --> Networks --> Monitoring Intervals, remove selection for the category for which you want to disable polling.
- Specifying Polling Intervals for Devices : From Admin --> Networks --> Monitoring Intervals, configure a smaller monitoring interval for critical categories like servers or routers and space out for the other categories like printers etc. The recommended interval for very critical devices is 5 minutes, while you can set a minimum of 1 minute interval also for a very few devices.
- Device Dependencies : False alerts are triggered when a set of monitored devices are behind another device (a firewall, router etc). The requests sent to the devices are routed through the firewall or router, and in the event of these dependent devices being down, all devices behind this dependent devices are deemed as down. Configuring device dependencies will prevent
unnecessary polling to the devices behind the dependent device. - Increase JVM heap size : Memory tuning can be achieved by editing the following parameters in "wrapper.conf" file available under the directory<IT360_Home>/networks/conf
- wrapper.java.initmemory – Initial Java Heap Size
- wrapper.java.maxmemory – Maximum Java Heap size (Below 2.5K interfaces - 2048, 2.5K to 5K interfaces - 4096, 5K to 8K interfaces - 4096)
- Alerting on Datacollection Problem :
- Go to the <IT360 Home> etworksconf directory and take a backup of the NmsProcessesBE.conf file available in that directory in some other directory say C:
- Check for the "PROCESS com.adventnet.nms.poll.Collector" in that file.
- The next line starts with "ARGS" and add the attribute "GENERATE_DATACOLL_EVENT true" at the end of that line. Once added that line will look like as given below.
- ARGS POLL_OBJECTS_IN_MEMORY 25 POLL_JDBC true MAX_OIDS_IN_ONE_POLL 15 AUTHORIZATION true DATA_COLLECTION_QUERY_INTERVAL 120000 PASS_THRO_ALL_POLLING_OBJECTS true CLEAN_DATA_INTERVAL 999999 GENERATE_DATACOLL_EVENT true
- Save this file and restart the IT360 Probe service for the above change to take effect.
- Once re-started, as when there is a problem in datacollection occurs, a alert will be generated in the Networks module. The alert can be viewed under the 'Alarms' tab.
- One word of caution. If there is lot of SNMP timeout or SNMP related problems during the SNMP Datacollection, then there is a possibility that lot of alerts generated in the IT360 Probe service.
- User Synch Interval from ActiveDirectory : The Synchornization interval for Import users from Active Directory should be set not less than one day if the number of users is high. The setting can be found under Admin --> General --> Active Directory --> Import users from Active Directory.