INDEX
Explanations
references to influential figures and their actions within a socio-political context
New Auto-Interp
Negative Logits
ÅĽÄĩ
-0.17
-Men
-0.16
ROID
-0.16
داخ
-0.15
ãĥ¼ãĥ³
-0.15
šli
-0.15
serr
-0.15
Lanka
-0.15
suppress
-0.14
.localized
-0.14
POSITIVE LOGITS
inder
0.33
Singh
0.29
Sing
0.27
INDER
0.27
sing
0.27
Gill
0.26
jit
0.25
bir
0.24
pre
0.23
Parm
0.23
Activations Density 0.057%