INDEX
Explanations
phrases that indicate agreement or affirmation
New Auto-Interp
Negative Logits
uke
-0.15
orama
-0.15
олов
-0.14
ickey
-0.14
oomla
-0.14
Fill
-0.14
culus
-0.13
-0.13
Blanco
-0.13
Blind
-0.13
POSITIVE LOGITS
LENG
0.17
engu
0.15
epad
0.15
automát
0.15
raci
0.14
808
0.14
Ñĥди
0.14
uden
0.14
xious
0.14
aru
0.13
Activations Density 0.057%