INDEX
Explanations
words related to the concept of self-harm or self-endangerment
terms related to the concept of "antics" or "antagonistic behavior."
New Auto-Interp
Negative Logits
Roh
-0.72
______
-0.69
shake
-0.66
kil
-0.66
center
-0.64
Haz
-0.63
glove
-0.62
hum
-0.61
chancellor
-0.61
hid
-0.61
POSITIVE LOGITS
antic
4.75
antically
2.18
antics
2.12
ANT
1.75
antis
1.74
ant
1.54
antine
1.41
anta
1.38
ancy
1.36
antes
1.25
Activations Density 0.009%