INDEX
Explanations
phrases related to being at risk
phrases indicating potential hazards or dangers
New Auto-Interp
Negative Logits
anche
-0.81
rix
-0.76
elf
-0.67
ript
-0.65
ovy
-0.65
headquartered
-0.64
ao
-0.63
ann
-0.62
erent
-0.62
MX
-0.61
POSITIVE LOGITS
ħĭ
0.84
endanger
0.84
angering
0.75
undermin
0.74
casualties
0.72
schizophren
0.70
Survive
0.68
dism
0.68
pitfalls
0.68
starvation
0.67
Activations Density 0.022%