INDEX
Explanations
mentions of clothing or uniforms worn by individuals
New Auto-Interp
Negative Logits
intrins
-0.72
selves
-0.70
cities
-0.70
settlements
-0.68
affiliates
-0.68
members
-0.67
yond
-0.66
theless
-0.65
allowable
-0.64
erness
-0.64
POSITIVE LOGITS
whom
0.81
otto
0.72
ape
0.67
stole
0.67
chief
0.66
ãĤ¨ãĥ«
0.65
ãĥĺ
0.65
gorilla
0.65
throne
0.64
sunglasses
0.63
Activations Density 0.297%