INDEX
Explanations
proper nouns
mentions of specific individuals
New Auto-Interp
Negative Logits
Anat
-0.74
Inst
-0.67
Bas
-0.65
AC
-0.64
AV
-0.64
origin
-0.63
Animal
-0.63
HD
-0.62
Air
-0.62
encl
-0.62
POSITIVE LOGITS
Neill
3.17
Connell
2.29
Connor
2.22
Leary
2.20
Sullivan
2.14
Reilly
2.02
Neal
1.95
Brien
1.93
Malley
1.91
Neil
1.91
Activations Density 0.030%