INDEX
Explanations
repeated use of the word "the."
New Auto-Interp
Negative Logits
ihar
-0.16
oun
-0.14
urrect
-0.14
ponsible
-0.14
IRD
-0.13
OTHERWISE
-0.13
orny
-0.13
nar
-0.13
ing
-0.13
è¶
-0.13
POSITIVE LOGITS
few
0.27
few
0.21
Few
0.19
pret
0.18
Few
0.17
many
0.16
nhiá»ģu
0.15
liv
0.15
rare
0.15
emap
0.15
Activations Density 0.059%