INDEX
Explanations
phrases indicating likelihood or probability
New Auto-Interp
Negative Logits
Sphinx
-0.15
onn
-0.15
поÑħ
-0.14
Tat
-0.14
Ñĥва
-0.14
Bou
-0.13
ÅĻe
-0.13
seems
-0.13
promin
-0.13
ekt
-0.13
POSITIVE LOGITS
-pro
0.36
most
0.35
PRO
0.30
PRO
0.28
most
0.28
Most
0.27
MOST
0.27
pro
0.27
Most
0.27
MOST
0.25
Activations Density 0.028%