INDEX
Explanations
quotes and references to classic literature
New Auto-Interp
Negative Logits
ypo
-0.17
дол
-0.17
agr
-0.16
urer
-0.15
opper
-0.15
aub
-0.15
anko
-0.14
Shore
-0.14
ucc
-0.14
777
-0.14
POSITIVE LOGITS
Hed
0.17
Bunny
0.16
anford
0.16
SENS
0.16
roman
0.15
etten
0.14
<quote
0.14
ecz
0.14
moderne
0.14
Bench
0.14
Activations Density 0.002%