INDEX
Explanations
people's names with the specific pattern of being followed by a single capital letter and a non-zero activation value
references to individuals or names
New Auto-Interp
Negative Logits
tem
-0.66
reciproc
-0.64
subsid
-0.62
nonex
-0.61
bearing
-0.60
disparate
-0.60
rud
-0.60
skate
-0.60
disappro
-0.59
contradict
-0.58
POSITIVE LOGITS
eely
1.33
aylor
1.25
itsch
1.17
unn
1.17
ieves
1.16
evin
1.11
ettle
1.10
asser
1.09
elsen
1.08
olen
1.08
Activations Density 0.024%