INDEX
Explanations
winning dominance superiority
New Auto-Interp
Negative Logits
EQU
0.47
equal
0.46
ambitious
0.46
wę
0.46
uguale
0.45
tentar
0.44
inversa
0.43
igual
0.42
decentral
0.42
Attempt
0.42
POSITIVE LOGITS
overwhelmingly
0.78
圧倒
0.78
overwhelming
0.77
overpowering
0.69
dominance
0.66
domination
0.66
dominate
0.66
superiority
0.64
victorious
0.64
dominates
0.64
Activations Density 0.298%