INDEX
Explanations
proper nouns, particularly names of individuals
New Auto-Interp
Negative Logits
unda
-0.63
ge
-0.62
atively
-0.60
atical
-0.57
opsis
-0.57
gments
-0.55
ctors
-0.53
ctor
-0.53
arily
-0.53
gebra
-0.53
POSITIVE LOGITS
hips
0.79
hip
0.75
hops
0.72
'
0.65
pring
0.62
mith
0.62
peed
0.60
hire
0.59
boro
0.59
heet
0.59
Activations Density 7.533%