INDEX
Explanations
action words indicative of causation or production
New Auto-Interp
Negative Logits
IFT
-0.66
Forest
-0.65
itte
-0.63
ierre
-0.62
EEK
-0.61
talk
-0.61
cot
-0.61
oother
-0.58
ifted
-0.57
POSE
-0.56
POSITIVE LOGITS
by
1.23
BY
1.08
aback
0.93
herein
0.87
therein
0.86
bys
0.83
jointly
0.81
by
0.81
exclusively
0.81
By
0.81
Activations Density 1.723%