INDEX
Explanations
phrases emphasizing imperative actions or guidance
New Auto-Interp
Negative Logits
ReusableCell
-0.75
devint
-0.60
hausse
-0.54
cupa
-0.54
relève
-0.53
lifies
-0.52
mediates
-0.51
Wicidata
-0.51
BEEN
-0.51
supplémentaires
-0.50
POSITIVE LOGITS
ensure
0.91
try
0.85
first
0.81
get
0.81
have
0.79
hopefully
0.74
do
0.71
somehow
0.71
not
0.71
firstly
0.66
Activations Density 0.498%