INDEX
Explanations
phrases related to identifying and discussing risks
phrases related to risks and potential dangers
New Auto-Interp
Negative Logits
ergy
-0.94
gdala
-0.78
cle
-0.78
gran
-0.76
Hat
-0.71
Nap
-0.71
eve
-0.70
eenth
-0.70
Kinnikuman
-0.69
MT
-0.69
POSITIVE LOGITS
risks
1.09
risk
0.93
dangers
0.88
pitfalls
0.88
afety
0.87
hazards
0.87
endanger
0.84
crow
0.78
jeopard
0.77
consequences
0.77
Activations Density 0.013%