INDEX
Explanations
the word "the" and its frequent usage in text
New Auto-Interp
Head Attr Weights
0:0.08
1:0.07
2:0.09
3:0.07
4:0.07
5:0.08
6:0.08
7:0.09
8:0.08
9:0.08
10:0.08
11:0.08
Negative Logits
Clicker
-3.29
Blossom
-2.73
aph
-2.72
Pony
-2.65
Pupp
-2.63
Boot
-2.63
Lux
-2.61
Bee
-2.54
Blend
-2.47
Candy
-2.47
POSITIVE LOGITS
lie
2.66
dra
2.60
shield
2.55
dam
2.55
actionDate
2.53
shields
2.53
ship
2.50
kered
2.47
shielding
2.44
uld
2.42
Activations Density 0.000%