INDEX
Explanations
geographical locations and nationalities
New Auto-Interp
Negative Logits
0.75
ח
0.69
is
0.68
ק
0.67
માં
0.65
정이
0.60
in
0.56
两
0.56
ك
0.56
동일
0.55
POSITIVE LOGITS
-
0.85
:
0.75
٥
0.71
il
0.70
delà
0.67
á
0.64
el
0.63
៥
0.63
y
0.62
н
0.61
Activations Density 0.145%