INDEX
Explanations
phrases and words related to dangers and potential harm
New Auto-Interp
Negative Logits
uinal
-0.67
inyin
-0.63
Absorption
-0.62
idase
-0.62
scolas
-0.61
{~-0.61
ertos
-0.60
AILED
-0.60
ayrı
-0.59
ombus
-0.59
POSITIVE LOGITS
threat
1.85
threat
1.80
Threat
1.75
threats
1.75
Threats
1.69
Threat
1.68
threatened
1.50
Threats
1.47
threatens
1.47
threaten
1.40
Activations Density 0.040%