INDEX
Explanations
terms and phrases related to neutrality in various contexts
New Auto-Interp
Negative Logits
tradição
-0.51
tradición
-0.51
kaldı
-0.48
Vordergrund
-0.47
IBOutlet
-0.46
Verantwortung
-0.45
romántica
-0.44
guiente
-0.43
afecto
-0.43
Italij
-0.42
POSITIVE LOGITS
wool
0.75
Rule
0.69
Gray
0.68
Gray
0.68
Rule
0.68
gray
0.67
GRAY
0.63
gray
0.62
Wool
0.60
Rules
0.60
Activations Density 0.189%