INDEX
Explanations
references to historical political figures and events in India
New Auto-Interp
Negative Logits
elan
-0.17
tal
-0.15
shaved
-0.14
ỡ
-0.14
kk
-0.14
uga
-0.14
ij
-0.14
ecute
-0.14
wonders
-0.14
館
-0.14
POSITIVE LOGITS
isers
0.18
venes
0.16
owie
0.16
zdy
0.15
Hue
0.15
amu
0.15
vine
0.15
Blaze
0.14
iring
0.14
reluct
0.14
Activations Density 0.056%