INDEX
Explanations
phrases that express requests for assistance or offers to help
New Auto-Interp
Negative Logits
sd
-0.15
tring
-0.15
thouse
-0.14
боÑĤ
-0.14
intree
-0.13
rada
-0.13
jug
-0.13
жд
-0.13
.Hosting
-0.13
Ł
-0.13
POSITIVE LOGITS
Äįer
0.16
703
0.15
Ø®ÙĬ
0.14
aines
0.14
761
0.13
ategory
0.13
Å¡tÃŃ
0.13
atal
0.13
-ves
0.13
alic
0.13
Activations Density 0.027%