INDEX
Explanations
geographical locations or entities
the term "gent" related to social or diplomatic contexts
New Auto-Interp
Negative Logits
WT
-0.77
ADE
-0.75
IRO
-0.71
oufl
-0.71
esty
-0.68
ecause
-0.68
senal
-0.68
Downloadha
-0.66
qqa
-0.65
inki
-0.65
POSITIVE LOGITS
gent
1.49
lus
0.80
rant
0.75
rants
0.75
ente
0.74
nesses
0.72
gently
0.71
rification
0.70
é¾įå¥ij士
0.70
wick
0.69
Activations Density 0.006%