INDEX
Explanations
following prompts or structured text
New Auto-Interp
Negative Logits
!”
0.68
azol
0.66
Büh
0.64
Smol
0.60
absol
0.59
av
0.58
!”,
0.58
players
0.58
撿
0.57
殯
0.56
POSITIVE LOGITS
>
1.91
>
1.77
>>
1.64
>>
1.59
>;
1.52
>*
1.48
]>
1.47
?>
1.47
>\
1.44
>>>
1.43
Activations Density 0.111%