If you have `sn_start_frim_file = .true., .true., .true., .true.,` create the .sno file
If you have `sn_start_frim_file = .false., .false., .false., .false.,` we are looking the source of error.
If you have `sn_start_frim_file = .true., .true., .true., .true.,` create the .sno file If you have `sn_start_frim_file = .false., .false., .false., .false.,` we are looking the source of error.
• **forrtl: severe (174): SIGSEGV, segmentation fault occurred**
• **forrtl: severe (174): SIGSEGV, segmentation fault occurred**
Can be solved by decreasing the `time_step` and changing the `parent_time_step_ratio`.
Can be solved by decreasing the `time_step` and changing the `parent_time_step_ratio`.
• **Tile Strategy is not specified. Assuming 1D-Y Total number of tiles is too big for 1D-Y tiling. Going 2D. New tiling is 2x 17**
• **Tile Strategy is not specified. Assuming 1D-Y Total number of tiles is too big for 1D-Y tiling. Going 2D. New tiling is 2x 17**
Reduce 'cpu-per-node' in WRF_MAIN.job
Reduce 'cpu-per-node' in WRF_MAIN.job
• **Problems when restarting**
• **Problems when restarting**
We can have a problem with one type of file, for example we think that snowpack melts all the snow in some lake grid cells. So, SNOWPACK should be fixed, but meanwhile we can just try to __hack it__. To do it we can change the land use of these grid cells to grass (the most similar to lakes). That can be easily do it to the geo_em files as:
We can have a problem with one type of file, for example we think that snowpack melts all the snow in some lake grid cells. So, SNOWPACK should be fixed, but meanwhile we can just try to **hack it**. To do it we can change the land use of these grid cells to grass (the most similar to lakes). That can be easily do it to the geo_em files as:
The file do not execute real.exe or wrf.exe. That might be because it is linked to some file that does not exist.
The file do not execute real.exe or wrf.exe. That might be because it is linked to some file that does not exist.
• **Input data is acceptable to use: ./restart/wrfrst_d02_2022-03-17_15:00:00**
• **Input data is acceptable to use: ./restart/wrfrst_d02_2022-03-17_15:00:00**
input_wrf: forcing SIMULATION_START_DATE = head_grid start time
input_wrf: forcing SIMULATION_START_DATE = head_grid start time due to namelist variable reset_simulation_start
due to namelist variable reset_simulation_start
• **Error entering to new domain**
• **Error entering to new domain**
...
@@ -79,16 +65,14 @@ Reduce the number of 'export OMP_STACKSIZE' in WRF_MAIN.job
...
@@ -79,16 +65,14 @@ Reduce the number of 'export OMP_STACKSIZE' in WRF_MAIN.job
Also can be solved by restarting the simulation one hour before. Remember to change in the `namelist.wps` the starting time of the domains, `restart=”.true.”` and the `sn_start_frim_file` as `.true.` in all the domains that run before.
Also can be solved by restarting the simulation one hour before. Remember to change in the `namelist.wps` the starting time of the domains, `restart=”.true.”` and the `sn_start_frim_file` as `.true.` in all the domains that run before.
• **Error in \`/scratch/snx3000/gsergi/CRYOWRF_ALPS_2019_ssp585/WRF/./wrf.exe': corrupted size vs. prev_size: 0x000000000f202ef0 forrtl: error (76): Abort trap signal**
• **Error in `/scratch/snx3000/gsergi/CRYOWRF_ALPS_2019_ssp585/WRF/./wrf.exe': corrupted size vs. prev_size: 0x000000000f202ef0
forrtl: error (76): Abort trap signal**
It seems to be a leakage memory error. May be because the number of snpack layers have been exceded
It seems to be a leakage memory error. May be because the number of snpack layers have been exceded
• **At line 2065 of file mediation_integrate.f90
• **At line 2065 of file mediation_integrate.f90 Fortran runtime error: End of record**
Fortran runtime error: End of record**
I found the error in auxhist5. I just comment these lines and worked:
I found the error in auxhist5. I just comment these lines and worked:
@@ -96,50 +80,29 @@ I found the error in auxhist5. I just comment these lines and worked:
...
@@ -96,50 +80,29 @@ I found the error in auxhist5. I just comment these lines and worked:
! frames_per_auxhist5 = 8,8,12,12,
! frames_per_auxhist5 = 8,8,12,12,
```
```
• In eiger: **Lmod has detected the following error: Swap failed: "PrgEnv-cray" is not
• In eiger: **Lmod has detected the following error: Swap failed: "PrgEnv-cray" is not loaded. Lmod has detected the following error: The following module(s) are unknown: "cray-parallel-netcdf"**
loaded. Lmod has detected the following error: The following module(s) are unknown:
"cray-parallel-netcdf"**
you forgot to run ``` module load cray ``` before sbatch command
you forgot to run `module load cray` before sbatch command
• Does not necessarily fail but gives **_WARNING_ Time in input file not equal to time on domain _WARNING_ _WARNING_ Trying next time in file wrffdda_d01 ...** this happens if the wrffdda_d0x and wrflowinp_d0X files created by real.exe have a specific time frequency that does not agree with your restart time for example. usually it will go through all file entries until it finds the right time information (if time on domain is ahead as would usually happen with restart runs)
• Does not necessarily fail but gives ***WARNING* Time in input file not equal to time on domain *WARNING*
*WARNING* Trying next time in file wrffdda_d01 ...**
this happens if the wrffdda_d0x and wrflowinp_d0X files created by real.exe have a specific time frequency that does not agree with your restart time for example. usually it will go through all file entries until it finds the right time information (if time on domain is ahead as would usually happen with restart runs)
-**Program received signal SIGSEGV: Segmentation fault - invalid memory reference.** when entering in higher domain.
-**Program received signal SIGSEGV: Segmentation fault - invalid memory reference.** when entering in higher domain.
**\*\*Message from syslogd@eiger-ln002 at Jun 4 08:09:13 ... kernel:\[Hardware Error\]: Corrected error, no action required.**
Message from syslogd@eiger-ln002 at Jun 4 08:09:13 ...
**Message from syslogd@eiger-ln002 at Jun 4 08:09:13 ... kernel:\[Hardware Error\]: CPU:66 (17:31:0) MC17_STATUS\[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-\]: 0xdc2040000000011b**
**Message from syslogd@eiger-ln002 at Jun 4 08:09:13 ... kernel:\[Hardware Error\]: cache level: L3/GEN, tx: GEN, mem-tx: RD\*\***
FATAL CALLED FROM FILE: <stdin> LINE: 70
\ No newline at end of file
program wrf: error opening wrfinput_d01 for reading ierr= -1021
-------------------------------------------
MPICH Notice [Rank 0] [job id 3107422.0] [Tue Jun 4 08:34:42 2024] [nid002237] - Abort(1) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0**