INDEX
Explanations
numerical values and data points
New Auto-Interp
Negative Logits
Cuth
-0.81
ؤلاء
-0.80
tershire
-0.75
getX
-0.75
Ibarra
-0.71
ricultural
-0.70
Majefty
-0.70
Muth
-0.70
($('#-0.69
Thors
-0.68
POSITIVE LOGITS
↵
1.15
</tr>
1.06
↵↵
1.03
())))
1.01
[toxicity=0]
0.96
">)</
0.93
())));
0.91
>)</
0.90
])))
0.87
']))
0.87
Activations Density 0.025%