INDEX
Explanations
terms related to physical proximity or closeness
New Auto-Interp
Negative Logits
gan
-0.18
VERR
-0.16
obox
-0.15
urator
-0.15
auge
-0.14
andler
-0.14
oulder
-0.14
oretical
-0.14
iei
-0.14
yı
-0.14
POSITIVE LOGITS
lessly
0.22
abouts
0.22
shore
0.20
s
0.20
ish
0.20
ness
0.17
casting
0.16
liest
0.16
-ish
0.15
igel
0.15
Activations Density 0.024%