INDEX
Explanations
phrases related to threats and dangers
negative descriptors related to safety or risk
New Auto-Interp
Negative Logits
krit
-0.64
ija
-0.62
leneck
-0.61
inas
-0.60
ordon
-0.60
AFP
-0.58
odox
-0.57
lp
-0.56
etsk
-0.55
miah
-0.54
POSITIVE LOGITS
to
1.62
to
1.44
To
1.18
TO
1.17
To
1.15
thereto
1.11
unto
0.85
ta
0.83
TO
0.78
toc
0.56
Activations Density 0.974%