INDEX
Explanations
terms referring to large or giant entities
New Auto-Interp
Negative Logits
erate
-0.16
erman
-0.15
yonel
-0.14
ÏĥÏĦαÏĥη
-0.14
chter
-0.14
reesome
-0.13
Dillon
-0.13
oulos
-0.13
ourcem
-0.13
ral
-0.13
POSITIVE LOGITS
-sized
0.18
elerik
0.16
onta
0.15
/small
0.15
uer
0.15
-size
0.15
olin
0.15
eye
0.15
anner
0.14
redient
0.14
Activations Density 0.013%