INDEX
Explanations
various scenarios or states related to negative experiences such as harm, fighting, or being in need
references to harm, risk, and experiences of suffering
New Auto-Interp
Negative Logits
Fla
-0.58
1897
-0.56
Ala
-0.54
oranges
-0.54
Tunis
-0.54
Ramos
-0.54
doubtless
-0.53
Vale
-0.52
Albania
-0.52
Columb
-0.52
POSITIVE LOGITS
thereof
1.20
depending
1.06
abouts
1.05
versa
0.98
Else
0.93
altogether
0.92
otherwise
0.87
omever
0.86
combination
0.83
equivalent
0.82
Activations Density 0.457%