INDEX
Explanations
terms indicating mildness or low severity
New Auto-Interp
Negative Logits
الحره
-0.73
utafitiHapana
-0.71
فريبيس
-0.70
BeginInit
-0.69
sockaddr
-0.66
RenderAtEndOf
-0.65
виправивши
-0.65
:][
-0.64
aarrggbb
-0.63
перь
-0.62
POSITIVE LOGITS
mild
1.75
mild
1.55
lightly
1.49
Mild
1.43
lightweight
1.42
modest
1.35
légère
1.34
Mild
1.32
faint
1.31
gentle
1.30
Activations Density 0.100%