INDEX
Explanations
instances of names or terms related to South Asian culture or context
New Auto-Interp
Negative Logits
nesday
-0.93
ruary
-0.88
abase
-0.83
hement
-0.68
nyder
-0.67
ascript
-0.63
eatures
-0.63
risome
-0.62
owship
-0.61
irlf
-0.60
POSITIVE LOGITS
oga
0.79
vati
0.71
chal
0.68
Wrap
0.67
inia
0.66
)</
0.65
acha
0.63
Bron
0.63
alid
0.60
ae
0.60
Activations Density 0.012%