INDEX
Negative Logits
WebElementEntity
-0.78
Leighton
-0.77
Tach
-0.77
\}\\
-0.74
Dowling
-0.73
gynhyrchwyd
-0.71
例文帳に追加
-0.71
element
-0.70
Dresden
-0.69
"]);
-0.69
POSITIVE LOGITS
harm
1.23
harm
1.23
Harm
1.17
Harm
1.16
harms
1.11
Harms
1.09
Harmful
1.05
Hurt
1.02
harmed
0.91
Hurt
0.88
Activations Density 0.008%