INDEX
Explanations
proper nouns, particularly names and titles
New Auto-Interp
Negative Logits
afa
-0.15
agner
-0.14
atham
-0.14
Dare
-0.14
765
-0.14
EventData
-0.14
jsc
-0.13
Cousins
-0.13
.ul
-0.13
Simone
-0.13
POSITIVE LOGITS
em
0.22
roid
0.19
uel
0.18
itters
0.18
GENCY
0.17
tee
0.17
erald
0.17
esar
0.16
.persist
0.15
omain
0.15
Activations Density 0.028%