INDEX
Explanations
phrases that indicate ordering or positioning of information
New Auto-Interp
Negative Logits
erupt
-0.62
yo
-0.61
average
-0.61
bro
-0.61
mons
-0.60
elta
-0.59
orting
-0.59
demol
-0.57
ibles
-0.57
Murder
-0.57
POSITIVE LOGITS
fearing
1.31
precaution
1.23
because
1.10
lest
1.09
because
1.09
Because
1.05
bec
1.05
fear
1.00
ecause
0.99
fears
0.97
Activations Density 7.935%