INDEX
Explanations
phrases indicating usefulness and helpfulness of information or resources
New Auto-Interp
Negative Logits
ollower
-0.15
еÑĢÑĮ
-0.14
ÃĸL
-0.14
ì¹Ļ
-0.14
acas
-0.14
reau
-0.14
Incontri
-0.14
Güven
-0.13
nton
-0.13
ABCDEFGHI
-0.13
POSITIVE LOGITS
useful
0.45
helpful
0.41
valuable
0.39
beneficial
0.35
handy
0.34
helf
0.34
Useful
0.33
invaluable
0.32
hữu
0.30
use
0.30
Activations Density 0.151%