INDEX
Explanations
requests for assistance or information
New Auto-Interp
Negative Logits
welcome
-0.17
леж
-0.17
ent
-0.17
Welcome
-0.15
welcome
-0.15
/welcome
-0.15
andan
-0.14
Welcome
-0.14
ä¸Ī
-0.13
aries
-0.13
POSITIVE LOGITS
please
0.30
please
0.26
PLEASE
0.23
Please
0.23
Please
0.23
bitte
0.22
èĥ½
0.18
могли
0.17
èĥ½
0.17
请
0.16
Activations Density 0.160%