INDEX
Explanations
occurrences of the word "to" in various contexts
New Auto-Interp
Negative Logits
outl
-0.78
ividual
-0.74
mble
-0.70
averages
-0.65
raising
-0.64
appropri
-0.63
oun
-0.63
vein
-0.63
working
-0.63
ancies
-0.61
POSITIVE LOGITS
ilet
1.11
jo
1.04
ppo
1.03
fen
1.00
pper
0.93
ppa
0.93
pping
0.93
fore
0.92
ffee
0.91
ven
0.91
Activations Density 0.007%