INDEX
Explanations
pronouns and definite articles in the text
New Auto-Interp
Negative Logits
aget
-0.15
upertino
-0.15
ho
-0.15
silver
-0.14
»
-0.13
brightest
-0.13
innamon
-0.13
slee
-0.13
Buen
-0.13
Wikipedia
-0.13
POSITIVE LOGITS
odable
0.16
andel
0.16
£
0.15
erver
0.15
üle
0.15
elan
0.14
ortex
0.14
abela
0.14
alon
0.14
rych
0.14
Activations Density 0.011%