INDEX
Explanations
mentions of specific groups of people, particularly citizens of different places
references to citizens or residents in various contexts
New Auto-Interp
Negative Logits
certs
-0.69
attribute
-0.69
edi
-0.68
agi
-0.67
neurot
-0.66
handshake
-0.66
phabet
-0.65
continu
-0.63
thumbs
-0.63
projector
-0.62
POSITIVE LOGITS
England
0.96
Janeiro
0.96
Philadelphia
0.93
Pennsylvania
0.93
Alexandria
0.92
Burlington
0.92
adelphia
0.90
Buenos
0.90
Plymouth
0.90
Ethiopia
0.89
Activations Density 0.121%