INDEX
Explanations
instances where people hold certain opinions or make particular claims
the word "have."
New Auto-Interp
Negative Logits
eem
-0.69
Apart
-0.62
catentry
-0.61
—-
-0.61
TG
-0.60
housing
-0.58
dc
-0.57
icking
-0.57
Anyway
-0.57
thief
-0.55
POSITIVE LOGITS
been
1.45
been
1.23
undergone
1.09
begun
1.02
Been
1.00
become
0.98
gotten
0.97
gone
0.96
arisen
0.94
seen
0.93
Activations Density 0.336%