INDEX
Explanations
phrases expressing certainty or likelihood
New Auto-Interp
Negative Logits
.criteria
-0.15
olf
-0.14
orthy
-0.14
allis
-0.14
_Tis
-0.14
hers
-0.14
Managed
-0.14
ÑĪив
-0.14
ald
-0.14
yme
-0.13
POSITIVE LOGITS
preferred
0.31
preferred
0.26
ideal
0.26
superior
0.23
prefer
0.22
ideal
0.22
Preferred
0.22
best
0.21
пÑĢедпоÑĩ
0.21
pref
0.21
Activations Density 0.161%