INDEX
Explanations
expressions of disbelief or skepticism about societal issues
New Auto-Interp
Negative Logits
aux
-0.17
bara
-0.15
eval
-0.14
æ¿
-0.14
ãĥªãĥ¼ãĤº
-0.14
Inn
-0.13
aight
-0.13
angan
-0.13
alez
-0.13
_callable
-0.13
POSITIVE LOGITS
竣
0.20
anyone
0.17
somehow
0.17
yte
0.16
oui
0.16
éra
0.16
PERT
0.15
à¤ĩतन
0.15
STILL
0.15
å¦ĤæŃ¤
0.15
Activations Density 0.217%