INDEX
Explanations
references to code documentation and annotations
New Auto-Interp
Negative Logits
poffe
-0.45
pleaſure
-0.44
faſt
-0.43
pleaf
-0.42
<bos>
-0.41
raiſ
-0.40
fuper
-0.40
fometimes
-0.40
Fase
-0.40
itching
-0.39
POSITIVE LOGITS
{@2.66
{@2.39
>{@1.37
'{@1.28
}{@0.97
}{@0.96
(!__
0.79
незавершена
0.78
(@
0.77
(@
0.76
Activations Density 0.223%