INDEX
Explanations
names of political figures or entities
prominent nouns and names related to specific people, places, or distinct entities
New Auto-Interp
Negative Logits
cially
-0.79
effective
-0.71
coded
-0.67
imately
-0.63
Effective
-0.61
orthy
-0.60
relevant
-0.60
staking
-0.60
Reloaded
-0.60
intensive
-0.60
POSITIVE LOGITS
citiz
0.79
alike
0.75
enclave
0.74
fanbase
0.73
sylv
0.72
adan
0.70
anon
0.70
sburg
0.68
vibe
0.67
kid
0.66
Activations Density 0.513%