INDEX
Explanations
proper nouns related to people or places
references to the term "den" with varying capitalization
New Auto-Interp
Negative Logits
feats
-0.62
Typh
-0.62
EED
-0.59
olicy
-0.59
ties
-0.58
mph
-0.57
OTH
-0.57
heet
-0.56
ONS
-0.56
ments
-0.56
POSITIVE LOGITS
izens
1.21
omination
1.16
izen
1.14
unciation
1.12
omin
1.09
arius
1.03
unci
1.03
zel
1.03
holm
0.99
ormal
0.92
Activations Density 0.035%