INDEX
Explanations
special characters and website links
sequences of special characters, specifically multiple instances of ">>>"
New Auto-Interp
Negative Logits
ahime
-0.82
words
-0.79
wagon
-0.79
tackle
-0.77
orial
-0.75
ible
-0.75
rive
-0.75
ript
-0.75
liest
-0.74
laus
-0.73
POSITIVE LOGITS
>>>>>>>>
1.73
>>>>
1.59
>>>
1.46
>>>
1.41
ertodd
0.98
âĸĪâĸĪâĸĪâĸĪâĸĪâĸĪâĸĪâĸĪ
0.91
¶
0.90
_>
0.89
âĸĵ
0.88
>>
0.86
Activations Density 0.009%