INDEX
Explanations
references to threats and dangers
New Auto-Interp
Negative Logits
uinal
-0.70
scolas
-0.66
Swanson
-0.66
Absorption
-0.65
inyin
-0.64
unny
-0.63
urator
-0.63
AppCompat
-0.62
ודם
-0.61
TypedDataSet
-0.61
POSITIVE LOGITS
threat
1.80
threat
1.72
Threat
1.71
threats
1.69
Threat
1.62
Threats
1.62
Threats
1.39
threatened
1.39
threatens
1.37
threatening
1.33
Activations Density 0.065%