INDEX
Explanations
phrases related to past actions or events
New Auto-Interp
Negative Logits
bie
-0.62
PI
-0.62
âϦ
-0.61
bery
-0.60
hack
-0.60
owe
-0.57
hammer
-0.57
Voters
-0.57
trope
-0.56
etting
-0.56
POSITIVE LOGITS
been
1.35
undergone
1.15
begun
1.15
gone
1.10
gotten
1.09
iths
1.07
been
1.02
previously
0.98
flown
0.94
taken
0.91
Activations Density 0.651%