INDEX
Explanations
words related to physical actions and violence
conjunctions and their patterns in phrases
New Auto-Interp
Negative Logits
ortium
-0.76
iciary
-0.70
XY
-0.68
Sloan
-0.67
xus
-0.63
ographed
-0.62
Scully
-0.62
Guatem
-0.62
CN
-0.61
Guerrero
-0.61
POSITIVE LOGITS
thur
0.68
tons
0.68
edge
0.67
itsch
0.67
oddy
0.67
argon
0.67
chest
0.67
IPM
0.67
anim
0.67
watching
0.66
Activations Density 0.452%