INDEX
Explanations
terms related to threats or danger
New Auto-Interp
Negative Logits
unny
-0.72
Absorption
-0.71
addPreferredGap
-0.69
inyin
-0.66
gustar
-0.66
urator
-0.66
ודם
-0.65
adins
-0.64
abin
-0.64
crisy
-0.62
POSITIVE LOGITS
threat
2.31
threat
2.22
Threat
2.19
threats
2.17
Threat
2.11
Threats
2.06
threatened
1.85
Threats
1.83
threatens
1.78
threatening
1.76
Activations Density 0.049%