INDEX
Explanations
describing negative states of characters
New Auto-Interp
Negative Logits
astounding
0.99
withstanding
0.98
včetně
0.95
oretical
0.95
vyd
0.95
название
0.94
Herstellung
0.93
unbelievable
0.93
př
0.92
opet
0.92
POSITIVE LOGITS
glers
1.16
chefs
0.99
lers
0.98
chef
0.96
households
0.96
ному
0.91
peasants
0.91
läger
0.90
heden
0.90
IANS
0.89
Activations Density 0.197%