Common Issues in Client Environments

26 October, 2018

Related Blogs

  1. Disk Space or Filesystem Utilisation Issues:-
You will see tickets/alerts/emails with the subject like the one below: “Filesystem Utilization for / is > 95% current value = 97%”
  1. Login to the server which has the issue.
  1. Run the following command:
find / -type f -size +100000K It will list all the files whose sizes are in GBs.
  1. a) If it is a live file (files which have the current timestamp or files that are currently getting written into.) then take a backup of the file to an alternate location and truncate the live file. Then gzip the backup file.
cp hugefilename /tmp/hugefilename.yyyymmdd > hugefilename gzip /tmp/hugefilename.yyyymmdd  
  1. b) If it is not a live file, simply take a backup of it and gzip the backup.
Repeat the above steps for other huge files if there are any.
  1. Check the disk space now.
df –h
  1. Update the ticket with your comments and inform the CSL about it.
For Saba (Server): Usually disk space alert will be due to accumulation of .tmp and .rpt files under /sabaweb/temp folder which can be deleted. This will free up enough space and will usually bring the utilization to 30% or so from 95 or 100%.   Special Case: In case of an open file, please bounce the jvm. Again for Saba, we cannot bounce the jvm as our current access is restricted to web layer only , So please inform the same to CSL and update the case accordingly. Note:  Please do NOT delete any file without prior intimation and permission from the client.  
  1. Service Stopped/Not running Issues:-
You will see tickets/alerts/emails with subject like the one below: “KOORDSWBS01 - /opt/apache2/bin/httpd -f /opt/apache2/conf/httpd.conf_9001” “/opt/apache2/bin/httpd -f /opt/apache2/conf/httpd.conf_9001 -k start[-1]: Process has stopped”
  1. Login to the server which has the issue.
  1. Run the following command to check if the service is really stopped.
ps –ef | grep httpd
  1. a) If the service is not running start it. Run the following command to start httpd as in the case of above alert.
root@koordswbs01 # /opt/apache2/bin/httpd -f /opt/apache2/conf/httpd.conf_9001 -k start
  1. If the service is already running, update the ticket and inform the CSL.
  1. Please verify that all the clients do not have the customized USI startup scripts in /etc/rc3.d folder as a convention.
  For Saba: We do not have access to Application server and for all such related issues please let the CSL know to update the client on the same. Please make sure that you are logging in with the right permissions. Note:  i). We generally take care of the Web and App. Related services. ii). There might be other services for which we do not have access to. In such a case,                    inform the CSL to contact the respective team. iii). We might need to restart the App. or the Web Server after the other team restart their services. For example, we might need to restart the App. or the Web Server after the DBA restarts the Database Services.  
  1. High CPU and Memory Utilization Issues:-
You will see tickets/alerts/emails with subject like the one below: “KOORDSAPS01 - CPU utilization is > 95% current value = 100%”   OR   “VILSUAAH01 - 15 minute load average  is >= 10.00 current value = 40.79”
  1. Login to the server which has the issue.
  1. Run the “top” command to see which process is consuming all the CPU.
  1. a) If it is the “java” process which is consuming all the CPU, then it is most likely that the Application Server is consuming all the CPU. Hence restart the Application server.
  1. If a process named “sysedge”  is consuming all the CPU, inform the CSL and ask them to  contact the monitoring team to fix it.
  1. If the process name is something like “bkbman”, it is most likely that the backup jobs are running on the server. Inform the CSLs and ask them to contact the Backup team to fix it.
  1. If it is the “oracle” process which is consuming all the CPU, then inform the CSL and tell them to contact the DBAs.
  Note:  i). We generally take care of the Web and App. Related services. ii). There might be other services for which we do not have access to. In such a case, inform the CSL to contact the respective team. iii). We might need to restart the App. or the Web Server after the other team restart their services. For example, we might need to restart the App. or the Web Server after the DBA restarts the Database Services.  
  1. Sysedge heartbeat not received issues.
You will see ticket/alerts/emails with subject like the one below: “VILSUAAH05 - Sysedge heartbeat not received for 15 minutes. The machine has lost trap connectivity to netcool, sysedge has crached, or the box is hung.” This generally happens if the monitoring agent is stopped or hung, the trap server IP is incorrect in the file or the server has lost network connection. Generally this issue is handled by GEMC or the Monitoring team. In cases where we are expected to handle this issue:
  1. Login to the server and check if sysedge is running.
ps –ef | grep sysedge
  1. Check whether the trap server IP in the is correct or not.
Note: Generally all servers in the same client environment will have the same trap server IP. For example, All the servers in the Kohler Environment will have the same trap server IP.
  1. Check if there are any stale or hung mount points using df –h or df –k.
  1. i) If there are any stale or hung mount points then unmount and remount them.
  2. ii) If there are any NFS mounts exported from another server, check whether both these servers are able to communicate to each other.
  1. If none of the above work out, inform the onshore team and the Client and get the box rebooted.
Note: Please do not reboot the box without informing the Client and the onshore team.  
  1. Generate a Log File
Before contacting BEA Technical Support for help with cluster-related problems, collect diagnostic information. The most important information is a log file with multiple thread dumps from a Managed Server. The log file is especially important for diagnosing cluster freezes and deadlocks. Remember: a log file that contains multiple thread dumps is a prerequisite for diagnosing your problem.
  1. Stop the server.
  1. Remove or back up any log files you currently have. You should create a new log file each time you boot a server, rather than appending to an existing log file.
  1. Start the server with this command, which turns on verbose garbage collection and redirects both the standard error and standard output to a log file:
% java -ms64m -mx64m -verbose:gc -classpath $CLASSPATH -Dweblogic.domain=mydomain -Dweblogic.Name=clusterServer1$WL_HOME/lib/weblogic.policy weblogic.Server >> logfile.txt   Redirecting both standard error and standard output places thread dump information in the proper context with server informational and error messages and provides a more useful log.
  1. Continue running the cluster until you have reproduced the problem.
  1. If a server hangs, use kill -3 or <Ctrl>-<Break> to create the necessary thread dumps to diagnose your problem. Make sure to do this several times on each server, spaced about 5-10 seconds apart, to help diagnose deadlocks.
  1. Compress the log file using a Unix utility:
% tar czf logfile.tar logfile.txt - or zip it using a Windows utility.
    1. Attach the compressed log file to an e-mail to your BEA Technical Support representative. Do not cut and paste the log file into the body of an e-mail.
  • If the compressed log file is too large, you can use the BEA Customer Support FTP site.
  6) When Weblogic gives an out of memory error, check the following things: When Weblogic gives an out of memory error, check the following things: Weblogic admin: mydomain > servers 11   Determine which server is lowest on memory. grep for errors in reedi:/usr/local/bea81/user_projects/mydomain2/mydomain2.log Look for out of memory errors or stuck thread errors. 7) How do I get a thread dump from a hung java process? When we redeploy our web apps through a script, it occasionally hangs the admin server. This is a bug that BEA fixed in SP3 but we have not applied.
kill -3 <pid>
  This will cause the process to write out the state of its threads to STDOUT, which we have directed to nohup.out. The output from a hung admin server looks like this:
Full threads dump Java Hotspot(TM) Server VM (1.4.1_02-ea-b01 mixed mode):   "ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'" daemon prio=5 tid=0xa0f490 nid=0x30 runnable [dc9ff000..dc9ffc24]         at weblogic.socket.PosixSocketMuxer.poll(Native Method)         at weblogic.socket.PosixSocketMuxer.processSockets(         - locked <eab9e280> (a java.lang.String)         at weblogic.socket.SocketReaderRequest.execute(         at weblogic.kernel.ExecuteThread.execute(  
at   "ExecuteThread: '1' for queue: 'weblogic.socket.Muxer'" daemon prio=5 tid=0x749490 nid=0x2f waiting for monitor entry [dcaff000..dcaffc24]         at weblogic.socket.PosixSocketMuxer.processSockets(         - waiting to lock <eab9e280> (a java.lang.String)         at weblogic.socket.SocketReaderRequest.execute(         at weblogic.kernel.ExecuteThread.execute(         at
  8) Weblogic startup error Just noticed this error in weblogic's stdout:
<Nov 29, 2006 2:12:49 PM PST> <Warning> <HTTP> <BEA-101248> <[ptagis]: Deployment descriptor "web.xml" is malformed. Check against the DTD: org.xml.sax.SAXParseException: The content of element type "web-app" must match "(icon?,display-name?,description?,distributable?,context-param*,filter* filter-mapping*,listener*,servlet*,servlet-mapping*,session-config?, mime-mapping*,welcome-file-list?,error-page*,taglib*,resource-env-ref*, resource-ref*,security-constraint*,login-config?,security-role*,env-entry*, ejb-ref*,ejb-local-ref*)". (line 109, column 11).>
  This is still an error. Here is what the beginning of web.xml looks like:
<!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" " _2_3.dtd">
  Info from BEA tech support
This error indicates that the elements entries in the web.xml file are out of order. For example, the <welcome-file> entry before the <servlet> and <servlet-mapping> entries. The web.xml elements must be in the following order in the web.xml file:   icon, display-name, description, distributable, context-param, filter, filter-mapping, listener, servlet, servlet-mapping, session-config, mime-mapping, welcome-file-list, error-page, taglib, resource-env-ref, resource-ref, security-constraint, login-config, security-role, env-entry, 
ejb-ref, ejb-local-ref.
  This is an error because the mime-mapping must come before the welcome-file-list. But simply moving the mime-mapping stanza to the proper place results in a more significant error from the weblogic parser that prevents deployment.
Module Name: ptagis, Error: [HTTP:101179][HTTP] Error occurred while parsing descriptor in Web application "/dsk2/p tagis-1.0/web/ptagis" [Path="/dsk2/ptagis-1.0/web", URI="ptagis" org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed.
  9) Weblogic stuck thread Top reports that the java process running the weblogic appserver on MS is running at high cpu usage:
24548 root      88   0    0  878M  662M cpu/0   68.4H 46.97% java
    Normal CPU usage is between 5% and 10%. Looking at the nohup.log for reedi (server), we see the stuck thread message:  
<Dec 11, 2006 10:51:32 AM PST> <Warning> <WebLogicServer> <BEA-000337> <ExecuteThread: '7' for queue: 'weblogic.kernel.Default' has been busy for "600" seconds working on the request "Http Request: /ptagis/loginAction.jsp", which is more than the configured time (StuckThreadMaxTime) of "600" seconds.>
  This calls for a restart of the appserver. Ask for a graceful restart of appserver_reedi from the weblogic admin gui. If the admin gui is unresponsive (because it can't talk to the stuck thread), you may be forced to use a kill or kill -9 on the java process on reedi. Then restart the weblogic appserver.   10) No suitable Driver and Connection Pool not created. I have tested the connection using util.dbping provided by BEA and have been successful. I made sure that the classpath and path variables for the server are identical to those in place when I run dbping. I still have the same problem! The following is my path and classpath as verified by server console upon startup: CLASSPATH=C:\jdev_9i\jdbc\lib\classes12.jar;C:\jdev_9i\jdbc\lib\classes111.jar;C:\bea\jdk131_03\lib\ tools.jar;C:\bea\weblogic700\server\lib\weblogic_sp.jar;C:\bea\weblogic700\server\lib\weblogic.jar PATH=.;C:\bea\weblogic700\server\bin;C:\bea\jdk131_03\bin;C:\bea\weblogic700\server\bin\oci817_8;C:\ ora817\bin Here, also, is the error message received upon startup: <Dec 13, 2002 11:39:13 AM MST> <Error> <JDBC> <001060> <Cannot startup connection pool "MyConnection Pool" java.sql.SQLException: No suitable driver>   Solution: 1) Plese try using url=jdbc:oracle:oci8:@host:1521. Hope this should work. 2) Check if is where weblogic.jar is..Namely, {weblogic}/server/lib. If not copy from {weblogic}/server/ext/jdbc/oracle/{oracle_version} Or copy from oracle jdbc/lib directory... 3) The URL must be ....thin:@ NOT ...thin@   thats all   11) Unpack a Weblogic 10.3 domain on one of our production servers (SunOS 5.10), but get the following error:  $ /opt/bea10/wlserver_10.3/common/bin/ -template=/tmp/CM.jar -domain=/opt/bea10/user_projects/CM Error: failed to create the temporary script file Assuming that this is a privileged problem: where actually the unpack utility tries to create its temporary script files? The unpack script calls a Java class com.bea.plateng.domain.script.Unpacker, so reading the script itself does not reveal the location. I need to ask the sysadmin for the privileges, so an exact directory location is needed. Of course, the error message is so vague that this might also be some other issue. Any ideas?   Solution: With unpack the problem is there because unpack needs write permission on one folder and the file domain-registry.xml. This problem occurs because you have installed the Weblogic installation with one user and you want to execute unpack with a different user. The user starting the needs write access to the folder $BEA_HOME/wlserver_10.3/common/lib. A temporary file is written here by the user executing the unpack command. This file is removed by unpack after the unpack command has terminated. Beside of this directory the file $BEA_HOME/domain-registry.xml is updated by the unpack command. Use chmod as the install user to give write access for the time you need to unpack the domain on the folder and the file with the command below: chmod a+rwx $BEA_HOME/wlserver_10.3/common/lib $BEA_HOME/domain-registry.xml After the creation of the domain change the permission to a safe value. Error: Problem invoking WLST - java.lang.UnsupportedClassVersionError: Bad version number in .class file Scenario: Trying to stop WebLogic Server Solution: The jdk version used while creating the domain is not supported. Use higher version (present in the same WebLogic server installed location) and point it while domain creation. Check out our Related Courses Weblogic Tutorials Introduction to Clustering Weblogic Oracle Goldengate Training in Hyderabad Oracle Weblogic Server Installation                    
About Author


Author Bio

TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills .