INDEX
Explanations
phrases indicating a continuous or repeated behavior
instances of the word "been" in various contexts
New Auto-Interp
Negative Logits
eers
-0.77
rones
-0.69
achu
-0.67
erity
-0.65
opolis
-0.64
arta
-0.64
Bars
-0.64
izable
-0.64
odder
-0.62
regate
-0.62
POSITIVE LOGITS
able
1.04
bitten
1.03
seen
0.97
taken
0.96
given
0.90
done
0.88
eaten
0.88
deemed
0.85
shown
0.84
beaten
0.83
Activations Density 0.149%