Fails to start weblogic instances maintenance of Admin server as well as all managed servers etc. Troubleshooting in weblogic is find out the cause(Server fail, Server shutdown, SSL issue, Thread stuck, etc…) of weblogic server failure if it occur by servers logs
Troubleshooting Common Problems
Before You Start the Cluster
You can do a number of things to help prevent problems before you boot the cluster.
Check for a Cluster License
Your WebLogic Server license must include the clustering feature. If you try to start a cluster without a clustering license, you will see the error message Unable to find a license for clustering.
Check the Server Version Numbers
All Managed Servers in a cluster and the cluster’s Administration Server should run under the same version of WebLogic Server. The major and minor version numbers (e.g 9.1), service packs, and attached patch levels should be same across the cluster.
Check the Multicast Address
A problem with the multicast address is one of the most common reasons a cluster does not start or a server fails to join a cluster.
A multicast address is required for each cluster. The multicast address can be an IP number between 220.127.116.11 and 18.104.22.168, or a host name with an IP address within that range.
You can check a cluster’s multicast address and port on its Configuration–>Multicast tab in the Administration Console.
For each cluster on a network, the combination of multicast address and port must be unique. If two clusters on a network use the same multicast address, they should use different ports. If the clusters use different multicast addresses, they can use the same port or accept the default port, 7001.
Before booting the cluster, make sure the cluster’s multicast address and port are correct and do not conflict with the multicast address and port of any other clusters on the network.
The errors you are most likely to see if the multicast address is bad are:
- Unable to create a multicast socket for clustering
- Multicast socket send error
- Multicast socket receive error
Check the CLASSPATH Value
Make sure the value of CLASSPATH is the same on all managed servers in the cluster. CLASSPATH is set by the setEnv script, which you run before you run startManagedWebLogic to start the managed servers.
By default, setEnv sets this value for CLASSPATH (as represented on Windows systems):
If you change the value of CLASSPATH on one managed server, or change how setEnv sets CLASSPATH, you must change it on all managed servers in the cluster.
Check the Thread Count
Each server instance in the cluster has a default execute queue, configured with a fixed number of execute threads. To view the thread count for the default execute queue, choose the Configure Execute Queue command on the Advanced Options portion of the Configuration> General tab for the server. The default thread count for the default queue is 15, and the minimum value is 5. If the value of Thread Count is below 5, change it to a higher value so that the Managed Server does not hang on startup.
After You Start the Cluster
This section describes first troubleshooting steps to perform if you have problems trying to start a cluster.
Check Your Commands
If the cluster fails to start, or a server fails to join the cluster, the first step is to check any commands you have entered, such as startManagedWebLogic or a java interpreter command, for errors and misspellings.
Getting a JRockit Thread Dump under Linux
If you use the JRockit JVM under Linux, use one of the following methods to generate a thread dump.
- Use the weblogic.admin THREAD_DUMP command.
- If the JVM’s management server is enabled (by starting the JVM with the -Xmanagement option), you can generate a thread dump using the JRockit Management Console.
- Use Kill -3 PID, where PID is the root of the process tree.
To obtain the root PID, perform a: ps -efHl | grep ‘java’ **. **
Using a grep argument that is a string that will be found in the process stack that matches the server startup command. The first PID reported will be the root process, assuming that the ps command has not been piped to another routine.
Under Linux, each execute thread appears as a separate process under the Linux process stack. To use Kill -3 on Linux you supply must match PID of the main WebLogic execute thread, otherwise no thread dump will be produced.
Check Garbage Collection
If you are experiencing cluster problems, you should also check the garbage collection on the managed servers. If garbage collection is taking too long, the servers will not be able to make the frequent heartbeat signals that tell the other cluster members they are running and available.
If garbage collection (either first or second generation) is taking 10 or more seconds, you need to tune heap allocation (the ms mx parameter) on your system.
Check MultiCast Address:
You can verify that multicast is working by running utils.MulticastTest from one of the managed servers.
Monitor Weblogic through the command line
Everything that is visible through the Weblogic admin console (http://localhost:7001/console) can be accessed through a command line java tool. This tool can be used to gather data about the weblogic servers via scripting. There are at least two ways to get runtime monitoring data about weblogic processes. This document covers the use of the java classes that get information from management beans (mbeans). There is also a java tool that allows for browsing the mbean tree like an ftp client: Weblogic Scripting Tool (WLST).
Set the Environment:
Set the java environment
There is a script for setting the CLASSPATH and PATH so that this tool can work. On pitblade (WL Server), the setWLSEnv script is /dsk2/local/bea81/weblogic81/server/bin/setWLSEnv.sh
pitblade:II:root: > source setWLSEnv.sh
Your environment has been set.
Verify that the java environment is set properly Success
pitblade:II:root: > java weblogic.Admin
weblogic.Admin is a command-line utility for managing WebLogic Server. Try:
weblogic.Admin help LIFECYCLE Starting, stopping, discovering servers
weblogic.Admin help INFO Retrieving info about WebLogic Server
weblogic.Admin help JDBC Working with JDBC connection pools
weblogic.Admin help MBEAN Working with WebLogic Server MBeans
weblogic.Admin help CLUSTER Working with clusters
weblogic.Admin help ALL Help for all commands
Usage: java [<SSL trust options>] weblogic.Admin
[ [-url | -adminurl] [<protocol>://]<listen-address>:<port>]
-username <username> -password <password>
More info available at: http://e-docs.bea.com/wls/docs81/admin_ref/cli.html
pitblade:II:root: > java weblogic.Admin
Exception in thread “main” java.lang.NoClassDefFoundError: weblogic/Admin
pitblade:II:root: > java weblogic.Admin -username user -password pass GETSTATE myserver
Current state of “myserver” : RUNNING
get server config
pitblade:II:root: > java weblogic.Admin -username user -password pass GET -pretty -type Server
Test if weblogic is running:
On command prompt issue this command:
ps -eaf | grep java
You should get a response like this:
reedi.psmfc.org:C1:root: > ps -eaf | grep java
root 6233 6221 0 Jan 31 ? 2:50 /usr/local/bea81/jdk141_02/bin/java -server -Xms32m -Xmx200m -XX:MaxPermSize=12
root 6273 6261 0 Jan 31 ? 361:59 /usr/local/bea81/jdk141_02/bin/java -server -Xms256m -Xmx512m -Djava.awt.headle
root 1121 6273 0 Feb 08 ? 15:29 /usr/local/bea81/jdk141_02/jre/bin/java -Xms256m -Xmx1024m -cp /global/ds1/pitw
root 2087 20643 0 13:44:11 pts/1 0:00 grep java
Restart weblogic on the cluster
- Use the admin console to request a graceful shutdown of the server
- start a bash shell
- Use this command to start:
nohup ./startManagedWebLogic.sh &
- Tail –f nohup.out until the server enters the running state
- repeat for each server in the cluster
Restart weblogic on the development server
- Use the admin console to request a graceful shutdown
- start a bash shell on server
- Use this command to start:
nohup ./startWeblogic.sh &
Weblogic and apache debug info
On pitblade(server), the file /tmp/wlproxy.log is filling the filesystem and crashing the server. Apparently, this file is debug information generated by apache about the weblogic proxy plug in. It probably causes a severe drain on resources and should not be generally active. Here is the relevant stanza from httpd.conf:
Note: The specified location of the log file seems to be ignored. There would not have been a problem if the log file was actually located where it was directed in the configuration.
#MatchExpression /ptagis/home/regTagAction WebLogicHost=pitblade|WebLogicPort=7001|Idempotent=OFF
After changing this setting, apache needs a restart like this:
Turned off this debugging output on pitblade and bay. The response time seems better as a result.
- Open source load generation tools
- Apache JMeter, Grinder, OpenSTA, Dieseltest
- Starting the sun jvm with -hotspot allows it to dynamically optimize the bytecode
- If you allocate too much memory to a jvm, it can make garbage collection run slower and make the app stall for best performance, set min and max memory for the jvm to the same value. the growing and shrinking of the jvm has overhead.
- Start with -verbosegc on the commandline. This will include in stdout information about the garbage collection. It should take less than a second. If GC takes longer than 3 seconds, then reduce the heap size to make GC happen more often and thus have GC do less work each time.
- Can turn on generational GC (NewSize) must be using hotspot. Creates a nursery in the heap where partial GCs run more frequently. Assumes most objects are short-lived.
- JRockit, only supported on Intel. Beta support for solaris.
- Locate the bottleneck: CPU, IO, DB.
- Network tune: set tcp_time_wait_interval from 240 to 60 on solaris
- Execute thread pool is set to 25 in production mode and 15 on development
- Native IO socket readers are much faster than pure java threads. So check the box that says enable native IO.
- 25 to 65 are optimal for Execute threads.
- A stuck thread cannot be released without rebooting the jvm. When all the threads in the server are stuck, the server’s health is flagged as critical. Nodemanager can be set to automatically restart a server that is critical.
- Object pooling, passivation, activation.
Common WebLogic Server Deadlocks and How to Avoid Them
While there are many possible causes of performance degradation or hangs, this article can’t possibly cover them all. Instead, we’ll look at three common mistakes in WebLogic Server applications that can deadlock the server or bring your performance to a screeching halt.
The best Java tool for diagnosing deadlocks is a Java virtual machine thread dump. A thread dump is a snapshot of the virtual machine’s current state, including stack traces for each Java thread. Many virtual machines also include information about the Java monitors held by each thread. Monitor information is especially useful for diagnosing deadlocks and performance problems in your application. On Windows platforms, you can generate a thread dump by pressing Ctrl-Break in the virtual machine’s window. On Unix systems, a SIGQUIT signal must be sent to the Java virtual machine process. This can be done with a kill -3 <process id>.
The classic deadlock problem is the deadly embrace: Thread 1 owns Lock A and waits on Lock B, Thread 2 owns Lock B and waits on Lock A. These threads are deadlocked and will remain blocked in this state. In many cases, the remaining threads will eventually enter the deadlock by attempting to acquire Lock A or Lock B and waiting. A thread dump is one of the best ways to discover a deadly embrace deadlock. Most virtual machines include a thread state for each Java thread in the dump. The most common thread states are: R – running; MW – monitor wait; CW – condition wait. Threads in the MW state are blocked, waiting to enter a synchronized block and acquire a Java monitor. Since the thread dump includes the Java thread’s stack trace, it’s also possible to determine which monitor is blocking the thread. If multiple threads are in the MW state on the same monitor, it’s a good indication that there’s either a lot of contention for this monitor, or the server is deadlocked. In a deadlock situation, you should be able to determine the other threads blocked in MW and their held monitors.
There are two classic techniques for solving deadly embrace deadlocks: deadlock avoidance and deadlock detection. Deadlock avoidance is merely changing or structuring your code so that it can’t hit the deadlock case. A common solution is to implement lock ordering. If Lock A is always acquired before Lock B then you can’t have a deadly embrace on these two locks. It’s also a good idea to minimize the time that a monitor is held. Every extra line of code that runs under a monitor is another chance for someone to add a deadlock. Deadlock detection is commonly implemented by databases for their locks, but it doesn’t usually find its way into programming languages. In deadlock detection, deadlocks are automatically discovered and one or more deadlock participants (known as the victims) are killed and release their locks to break the deadlock. Java virtual machines do not break deadlocks on Java monitors, so deadlock avoidance is necessary.
Another common way to deadlock an application is what I call the “out of threads” deadlock. Unfortunately, this deadlock often doesn’t show up until a load test or, in the worst case, when your production application receives a lot of traffic. In this scenario, your WebLogic Server is running with a fixed number of threads. The application includes logic where a given request or action performs work in one thread and then blocks on work that must be done in another thread.