INDEX
Explanations
negations or words expressing denial
New Auto-Interp
Negative Logits
<bos>
-0.84
Roskov
-0.64
Мексичка
-0.62
suivantes
-0.60
terakhir
-0.60
the
-0.57
cref
-0.54
betweenstory
-0.51
számára
-0.50
tertinggi
-0.50
POSITIVE LOGITS
matter
1.45
amount
1.11
doubt
0.97
longer
0.90
tably
0.90
bodies
0.88
sooner
0.88
BODY
0.86
MATTER
0.81
wonder
0.80
Activations Density 0.108%