INDEX
Explanations
sequences of repeated letters
expressions of excitement or surprise
New Auto-Interp
Negative Logits
ourse
-0.76
"]=>
-0.64
contribut
-0.62
ãĥĹ
-0.61
substitution
-0.60
IAL
-0.59
vacated
-0.57
Redux
-0.57
occup
-0.57
Hasan
-0.57
POSITIVE LOGITS
mmmm
1.04
mmm
0.91
oooo
0.91
kidding
0.87
ooo
0.87
ahah
0.84
hhhh
0.82
aaaa
0.79
hhh
0.79
!!!!!
0.78
Activations Density 0.355%