INDEX
Explanations
the word "so" occurring with high activation values
phrases that express a sense of negation or contradiction
New Auto-Interp
Negative Logits
ãĥ¼ãĥĨãĤ£
-0.85
é¾
-0.69
MAP
-0.68
ãĤ¼ãĤ¦ãĤ¹
-0.64
annotations
-0.63
ayne
-0.60
{:-0.57
SHARES
-0.57
DERR
-0.57
scan
-0.56
POSITIVE LOGITS
much
0.86
aked
0.84
oths
0.82
oooo
0.81
ppy
0.80
othes
0.77
zin
0.76
akers
0.74
icably
0.73
oooooooo
0.73
Activations Density 0.061%