INDEX
Explanations
instances where events are described in relation to time
phrases indicating time and frequency
New Auto-Interp
Negative Logits
hor
-0.50
oya
-0.49
Oregon
-0.49
hop
-0.48
Prosecut
-0.47
-)
-0.46
intention
-0.46
]'
-0.45
)'
-0.45
hal
-0.44
POSITIVE LOGITS
fructose
0.53
*/(
0.51
ielding
0.51
iatrics
0.48
widget
0.47
favour
0.46
frightening
0.46
undet
0.45
disguise
0.45
unsuspecting
0.44
Activations Density 0.833%