INDEX
Explanations
references to statements or thoughts from others
instances of the word "have."
New Auto-Interp
Negative Logits
eem
-0.71
etting
-0.65
typ
-0.63
dc
-0.60
arter
-0.58
housing
-0.58
Pair
-0.58
ourgeois
-0.57
OOL
-0.57
behold
-0.57
POSITIVE LOGITS
been
1.51
been
1.31
undergone
1.12
Been
1.09
become
1.07
gotten
1.05
begun
1.03
gone
0.97
arisen
0.96
risen
0.95
Activations Density 0.279%