INDEX
Explanations
occurrences of non-zero values in a context likely to relate to health or performance metrics
New Auto-Interp
Negative Logits
tershire
-0.98
GenerationType
-0.85
accoon
-0.79
外部リンク
-0.77
persegu
-0.73
colorPrimary
-0.72
lotz
-0.72
owiak
-0.71
ؤلاء
-0.71
dalena
-0.69
POSITIVE LOGITS
s
0.80
[toxicity=0]
0.76
o
0.74
er
0.74
0.70
mantec
0.65
hline
0.63
ares
0.63
↵
0.63
anolamine
0.62
Activations Density 0.033%