INDEX
Explanations
words related to actions or events following a specific cue
instances of the word "then."
New Auto-Interp
Negative Logits
assed
-0.67
crazy
-0.65
tery
-0.64
borgh
-0.62
Bout
-0.61
Apart
-0.59
Transactions
-0.59
belief
-0.58
fairness
-0.58
caps
-0.58
POSITIVE LOGITS
proceeded
1.15
proceed
0.94
Ń·
0.91
veland
0.87
proceeds
0.84
ŃĶ
0.83
conclud
0.81
Ͻ
0.80
©¶æ
0.77
ebin
0.77
Activations Density 0.055%