INDEX
Explanations
people's names
mentions of the name "Alice" and related contexts
New Auto-Interp
Negative Logits
emate
-0.83
iaries
-0.82
eful
-0.80
itching
-0.80
eling
-0.78
els
-0.78
ework
-0.77
aries
-0.75
ifying
-0.75
er
-0.74
POSITIVE LOGITS
xon
0.91
Cooper
0.90
Alice
0.72
Wonderland
0.70
vana
0.69
Royale
0.68
pheus
0.67
hammer
0.67
BUG
0.65
retri
0.63
Activations Density 0.069%