INDEX
Explanations
countries and nationalities
New Auto-Interp
Negative Logits
u
0.64
patitth
0.55
άνθρω
0.54
ção
0.49
iéndose
0.47
orgánica
0.47
지에
0.46
i
0.46
iť
0.46
ul
0.46
POSITIVE LOGITS
be
0.52
是
0.46
0.46
I
0.45
I
0.45
ру
0.45
స్
0.44
ON
0.44
س
0.43
は
0.42
Activations Density 0.992%