INDEX
Explanations
references to the PLOS publication
New Auto-Interp
Negative Logits
fett
-0.41
itschrift
-0.41
colhead
-0.41
zemi
-0.40
dė
-0.40
noDo
-0.40
matite
-0.40
Pré
-0.39
Musique
-0.39
kwe
-0.39
POSITIVE LOGITS
los
2.67
LOS
1.79
lose
1.42
LOS
1.40
loss
1.40
Los
1.30
losa
1.26
los
1.24
losen
1.16
Los
1.15
Activations Density 0.010%