INDEX
Explanations
expressions of gratitude and appreciation
New Auto-Interp
Negative Logits
û
-0.14
oler
-0.14
оÑĢод
-0.13
æĹıèĩªæ²»
-0.13
umm
-0.12
ourage
-0.12
toler
-0.12
toa
-0.12
же
-0.12
taj
-0.12
POSITIVE LOGITS
thanks
0.77
thank
0.77
Thanks
0.71
Thank
0.67
THANK
0.67
thanks
0.66
Thanks
0.66
thank
0.62
Thank
0.61
gracias
0.59
Activations Density 0.363%