INDEX
Explanations
phrases related to familiarity or personal connections
instances of the word "know" and its variations
New Auto-Interp
Negative Logits
voucher
-0.74
pex
-0.74
evin
-0.72
issions
-0.71
isco
-0.70
phrine
-0.70
privatization
-0.67
Antar
-0.67
itter
-0.67
inance
-0.66
POSITIVE LOGITS
lege
1.09
ledged
1.00
ledge
0.97
abouts
0.78
edge
0.78
ABOUT
0.75
beforehand
0.75
LED
0.74
how
0.74
nothing
0.73
Activations Density 0.054%