INDEX
Explanations
instances where the text prompts the reader to take action
phrases indicating readiness or willingness to participate in an activity
New Auto-Interp
Negative Logits
orp
-0.73
roma
-0.72
orb
-0.72
oret
-0.69
agram
-0.66
ĸļ
-0.65
orean
-0.65
attribute
-0.65
auri
-0.64
ischer
-0.63
POSITIVE LOGITS
Ready
0.98
Ready
0.91
itude
0.81
arity
0.78
ready
0.77
ready
0.75
housing
0.70
fort
0.67
wired
0.66
fire
0.66
Activations Density 0.015%