INDEX
Explanations
phrases indicating size or significance
New Auto-Interp
Negative Logits
elman
-0.16
dlg
-0.16
715
-0.15
aan
-0.15
595
-0.15
ibu
-0.15
abis
-0.15
ing
-0.14
an
-0.14
inger
-0.14
POSITIVE LOGITS
ULD
0.16
Jaune
0.16
ouver
0.14
adele
0.14
bern
0.14
¶
0.13
uptools
0.13
vit
0.13
”↵↵
0.13
éĸ¢éĢ£
0.13
Activations Density 0.017%