INDEX
Explanations
proper nouns, particularly names and titles
New Auto-Interp
Negative Logits
avin
-0.18
orsch
-0.15
acht
-0.15
obao
-0.15
roker
-0.15
anke
-0.15
handlers
-0.15
Rig
-0.14
eldon
-0.14
likes
-0.14
POSITIVE LOGITS
umhur
0.25
ICA
0.19
ibr
0.18
elic
0.17
edd
0.17
wyn
0.17
IBUT
0.17
udo
0.16
ISR
0.16
afari
0.15
Activations Density 0.021%