INDEX
Explanations
phrases indicating personal beliefs or opinions
New Auto-Interp
Negative Logits
desn
-0.17
uki
-0.15
ovÃŃ
-0.15
à¥ģà¤Ĺ
-0.15
aucoup
-0.15
umba
-0.14
Ñıд
-0.14
azzo
-0.14
reverse
-0.14
Reverse
-0.14
POSITIVE LOGITS
otel
0.18
hast
0.16
eln
0.16
iams
0.15
ë°©
0.15
-char
0.14
Ink
0.14
akov
0.14
dd
0.14
elo
0.14
Activations Density 0.065%