INDEX
Explanations
phrases expressing a desire for acceptance and understanding
New Auto-Interp
Negative Logits
Begin
-0.15
abis
-0.14
afl
-0.14
à¹ģà¸ŀ
-0.13
à¸ŀล
-0.13
ัà¸ķà¸ĸ
-0.13
oggles
-0.13
·»
-0.13
akk
-0.13
виÑī
-0.13
POSITIVE LOGITS
treat
0.26
respect
0.22
Treat
0.22
treated
0.21
understand
0.21
treating
0.21
apprec
0.21
recip
0.20
care
0.20
treatment
0.19
Activations Density 0.193%