INDEX
Explanations
proper nouns or locations, specifically focusing on names of countries
mentions of geographic locations or proper nouns related to places
New Auto-Interp
Negative Logits
deck
-0.71
bread
-0.69
worn
-0.67
kid
-0.66
low
-0.65
spot
-0.65
iT
-0.64
Attribution
-0.64
geist
-0.62
roofs
-0.62
POSITIVE LOGITS
ð
1.01
cci
0.96
ñ
0.95
ÄŁ
0.86
ccess
0.83
zzi
0.82
qt
0.80
pload
0.79
ño
0.79
Kenobi
0.78
Activations Density 0.041%