INDEX
Explanations
terms related to fatal consequences or dangers
New Auto-Interp
Negative Logits
burgh
-0.17
motion
-0.15
tte
-0.15
@student
-0.15
eka
-0.14
Motion
-0.14
motions
-0.14
olated
-0.14
ingen
-0.14
bons
-0.14
POSITIVE LOGITS
flaw
0.22
blow
0.20
dose
0.19
consequences
0.19
flaws
0.18
flawed
0.18
lest
0.16
ities
0.16
blows
0.16
consequence
0.16
Activations Density 0.018%