INDEX
Explanations
words related to destructive or violent actions involving physical harm
New Auto-Interp
Negative Logits
cluding
-0.72
Hist
-0.66
dds
-0.66
hm
-0.63
history
-0.62
inventoryQuantity
-0.61
CTV
-0.61
since
-0.60
EY
-0.60
Terms
-0.59
POSITIVE LOGITS
crept
1.13
emerged
1.11
emerges
1.10
sprang
1.10
collided
1.09
suddenly
1.04
popped
1.03
intervened
1.02
approached
1.01
swoop
0.98
Activations Density 0.394%