INDEX
Explanations
phrases indicating relationships or connections between entities
New Auto-Interp
Negative Logits
two
-1.08
two
-0.91
deux
-0.86
three
-0.85
TWO
-0.82
dwóch
-0.81
zwei
-0.78
TWO
-0.72
trois
-0.72
THREE
-0.71
POSITIVE LOGITS
brightest
0.71
loudest
0.67
deadliest
0.67
stesso
0.65
stessi
0.64
mismos
0.63
heaviest
0.63
terbesar
0.60
misma
0.59
strongest
0.59
Activations Density 0.079%