INDEX
Explanations
pronouns or nouns related to self-representation or self-identification
occurrences of the word "themselves."
New Auto-Interp
Negative Logits
Sierra
-0.72
aster
-0.69
Derby
-0.67
amia
-0.66
etta
-0.66
Fulton
-0.65
CASE
-0.65
Syndicate
-0.64
Rail
-0.64
Ki
-0.64
POSITIVE LOGITS
selves
1.24
selves
1.01
tremend
0.80
self
0.79
conduc
0.79
creatively
0.78
underwater
0.77
themselves
0.74
proport
0.74
spontaneously
0.73
Activations Density 0.037%