INDEX
Explanations
mentions of "pros" and to a lesser extent "cons" in various contexts
New Auto-Interp
Negative Logits
endon
-0.18
een
-0.16
cene
-0.15
ÃĹ↵↵
-0.15
et
-0.15
utschen
-0.14
лиз
-0.14
uations
-0.14
ernote
-0.14
emand
-0.14
POSITIVE LOGITS
pective
0.29
pects
0.26
pector
0.26
ively
0.23
pros
0.19
acco
0.18
Pros
0.18
peri
0.17
pectives
0.17
ely
0.17
Activations Density 0.012%