INDEX
Explanations
names of locations and people
references to American pop culture and notable individuals
New Auto-Interp
Negative Logits
ation
-0.95
ators
-0.94
ations
-0.90
ator
-0.88
ating
-0.85
ated
-0.80
Milan
-0.77
ATION
-0.72
ovember
-0.71
KER
-0.71
POSITIVE LOGITS
heastern
0.98
ocial
0.78
icult
0.73
sap
0.72
ocratic
0.71
Downs
0.68
mort
0.68
rig
0.67
ellect
0.65
mosqu
0.64
Activations Density 0.042%