INDEX
Explanations
proper nouns and usernames
words related to food and cuisine
New Auto-Interp
Negative Logits
Instr
-0.85
Ninth
-0.80
Mayo
-0.74
Calder
-0.73
Hert
-0.70
predec
-0.70
Seym
-0.70
Liver
-0.69
Borders
-0.69
Gong
-0.68
POSITIVE LOGITS
@
1.37
_
1.21
":{"1.08
oft
1.06
_-_
1.04
guy
1.02
podcast
1.01
666
0.99
Profile
0.96
clinton
0.95
Activations Density 0.270%