If you have `sn_start_frim_file = .true., .true., .true., .true.,` create the .sno file If you have `sn_start_frim_file = .false., .false., .false., .false.,` we are looking the source of error.
•**forrtl: severe (174): SIGSEGV, segmentation fault occurred**
•**forrtl: severe (174): SIGSEGV, segmentation fault occurred**
Can be solved by decreasing the `time_step` and changing the `parent_time_step_ratio`.
• **Tile Strategy is not specified. Assuming 1D-Y Total number of tiles is too big for 1D-Y tiling. Going 2D. New tiling is 2x 17**
• **Tile Strategy is not specified. Assuming 1D-Y Total number of tiles is too big for 1D-Y tiling. Going 2D. New tiling is 2x 17**
Reduce 'cpu-per-node' in WRF_MAIN.job
• **Problems when restarting**
• **Problems when restarting**
We can have a problem with one type of file, for example we think that snowpack melts all the snow in some lake grid cells. So, SNOWPACK should be fixed, but meanwhile we can just try to **hack it**. To do it we can change the land use of these grid cells to grass (the most similar to lakes). That can be easily do it to the geo_em files as:
We can have a problem with one type of file, for example we think that snowpack melts all the snow in some lake grid cells. So, SNOWPACK should be fixed, but meanwhile we can just try to __hack it__. To do it we can change the land use of these grid cells to grass (the most similar to lakes). That can be easily do it to the geo_em files as:
•**WRF_real.job or WRF_MAIN.job start and fail without creating a rsl.error.0000 file**
•**WRF_real.job or WRF_MAIN.job start and fail without creating a rsl.error.0000 file**
The file do not execute real.exe or wrf.exe. That might be because it is linked to some file that does not exist.
• **Input data is acceptable to use: ./restart/wrfrst_d02_2022-03-17_15:00:00**
• **Input data is acceptable to use: ./restart/wrfrst_d02_2022-03-17_15:00:00**
input_wrf: forcing SIMULATION_START_DATE = head_grid start time
due to namelist variable reset_simulation_start
input_wrf: forcing SIMULATION_START_DATE = head_grid start time due to namelist variable reset_simulation_start
•**Error entering to new domain**
•**Error entering to new domain**
Reduce the number of 'export OMP_STACKSIZE' in WRF_MAIN.job
Also can be solved by restarting the simulation one hour before. Remember to change in the `namelist.wps` the starting time of the domains, `restart=”.true.”` and the `sn_start_frim_file` as `.true.` in all the domains that run before.
• **Error in `/scratch/snx3000/gsergi/CRYOWRF_ALPS_2019_ssp585/WRF/./wrf.exe': corrupted size vs. prev_size: 0x000000000f202ef0
forrtl: error (76): Abort trap signal**
• **Error in \`/scratch/snx3000/gsergi/CRYOWRF_ALPS_2019_ssp585/WRF/./wrf.exe': corrupted size vs. prev_size: 0x000000000f202ef0 forrtl: error (76): Abort trap signal**
It seems to be a leakage memory error. May be because the number of snpack layers have been exceded
• **At line 2065 of file mediation_integrate.f90
Fortran runtime error: End of record**
• **At line 2065 of file mediation_integrate.f90 Fortran runtime error: End of record**
I found the error in auxhist5. I just comment these lines and worked:
@@ -96,50 +80,29 @@ I found the error in auxhist5. I just comment these lines and worked:
! frames_per_auxhist5 = 8,8,12,12,
```
• In eiger: **Lmod has detected the following error: Swap failed: "PrgEnv-cray" is not
loaded. Lmod has detected the following error: The following module(s) are unknown:
"cray-parallel-netcdf"**
• In eiger: **Lmod has detected the following error: Swap failed: "PrgEnv-cray" is not loaded. Lmod has detected the following error: The following module(s) are unknown: "cray-parallel-netcdf"**
you forgot to run `module load cray` before sbatch command
you forgot to run ``` module load cray ``` before sbatch command
• Does not necessarily fail but gives ***WARNING* Time in input file not equal to time on domain *WARNING*
*WARNING* Trying next time in file wrffdda_d01 ...**
this happens if the wrffdda_d0x and wrflowinp_d0X files created by real.exe have a specific time frequency that does not agree with your restart time for example. usually it will go through all file entries until it finds the right time information (if time on domain is ahead as would usually happen with restart runs)
• Does not necessarily fail but gives **_WARNING_ Time in input file not equal to time on domain _WARNING_ _WARNING_ Trying next time in file wrffdda_d01 ...** this happens if the wrffdda_d0x and wrflowinp_d0X files created by real.exe have a specific time frequency that does not agree with your restart time for example. usually it will go through all file entries until it finds the right time information (if time on domain is ahead as would usually happen with restart runs)
-**Program received signal SIGSEGV: Segmentation fault - invalid memory reference.** when entering in higher domain.
**Message from syslogd@eiger-ln002 at Jun 4 08:09:13 ... kernel:\[Hardware Error\]: CPU:66 (17:31:0) MC17_STATUS\[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-\]: 0xdc2040000000011b**
Message from syslogd@eiger-ln002 at Jun 4 08:09:13 ...
**Message from syslogd@eiger-ln002 at Jun 4 08:09:13 ... kernel:\[Hardware Error\]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.**
-**-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: <stdin> LINE: 70
program wrf: error opening wrfinput_d01 for reading ierr= -1021
-------------------------------------------
MPICH Notice [Rank 0] [job id 3107422.0] [Tue Jun 4 08:34:42 2024] [nid002237] - Abort(1) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0**
\ No newline at end of file
**Message from syslogd@eiger-ln002 at Jun 4 08:09:13 ... kernel:\[Hardware Error\]: cache level: L3/GEN, tx: GEN, mem-tx: RD\*\***