INDEX
Explanations
words related to geographic locations, particularly cities or countries
the mention of specific geographic locations
New Auto-Interp
Negative Logits
enegger
-0.77
ãĤ¢ãĥ«
-0.76
Ö¼
-0.75
ãĥ¼ãĥ³
-0.72
女
-0.69
UTION
-0.69
é¾įå¥ij士
-0.67
alf
-0.66
ãĤ¼ãĤ¦ãĤ¹
-0.64
ãĥŃ
-0.64
POSITIVE LOGITS
reth
0.95
lette
0.91
vre
0.89
vernment
0.88
ng
0.87
mi
0.87
seless
0.84
hou
0.84
chet
0.83
lish
0.82
Activations Density 0.015%