INDEX
Explanations
occurrences of the word "of"
New Auto-Interp
Negative Logits
olumn
-0.15
lich
-0.14
ileo
-0.14
anova
-0.14
illion
-0.13
ac
-0.13
roz
-0.13
Guth
-0.13
gorit
-0.13
owers
-0.13
POSITIVE LOGITS
few
0.20
Europe
0.19
LETE
0.17
America
0.16
Msp
0.15
ç½²
0.15
few
0.14
America
0.14
Few
0.14
Few
0.14
Activations Density 0.046%