INDEX
Explanations
proper nouns, particularly names of individuals
phrases indicating membership or inclusion in a group
New Auto-Interp
Negative Logits
bis
-0.94
ramid
-0.84
ahime
-0.84
iosity
-0.72
lag
-0.70
guiActiveUnfocused
-0.70
nosis
-0.68
ens
-0.66
simulac
-0.66
fal
-0.66
POSITIVE LOGITS
former
1.11
Mohamed
1.03
Abd
1.01
longtime
1.00
representatives
1.00
Reps
0.99
Pamela
0.98
Christine
0.98
Lt
0.97
Denis
0.97
Activations Density 0.277%