INDEX
Explanations
occurrences of the word "le" and its variations
New Auto-Interp
Negative Logits
ri
-0.18
resse
-0.17
ga
-0.16
dre
-0.16
re
-0.15
illes
-0.15
rought
-0.15
gli
-0.14
illet
-0.14
éra
-0.14
POSITIVE LOGITS
opard
0.22
aky
0.20
aving
0.20
aping
0.19
eks
0.17
wd
0.17
eward
0.17
ettle
0.17
islation
0.17
prech
0.17
Activations Density 0.034%