INDEX
Explanations
instances of specific characters or symbols in the text, particularly accents and typographical marks
New Auto-Interp
Negative Logits
abelle
-0.17
at
-0.17
enton
-0.16
cts
-0.15
ering
-0.15
ito
-0.15
ingham
-0.15
ager
-0.15
oup
-0.15
Kauf
-0.15
POSITIVE LOGITS
partir
0.22
travers
0.20
la
0.18
propos
0.18
titre
0.17
ylan
0.17
tort
0.16
rox
0.16
cause
0.15
ussy
0.15
Activations Density 0.008%