INDEX
Negative Logits
classifiers
0.48
ennemis
0.42
enemies
0.40
Enemies
0.39
classifiers
0.38
няют
0.38
inflammatory
0.38
캅
0.37
Reaktion
0.37
igest
0.36
POSITIVE LOGITS
0.91
0.90
0.77
outreach
0.76
0.74
0.73
0.71
0.71
0.70
Outreach
0.68
Activations Density 0.015%