INDEX
Explanations
instances of appreciation and gratitude expressed towards others
New Auto-Interp
Negative Logits
obi
-0.17
arp
-0.15
ixin
-0.15
arkin
-0.14
lus
-0.14
arna
-0.14
OOT
-0.14
رب
-0.13
.Rad
-0.13
adic
-0.13
POSITIVE LOGITS
ably
0.26
ately
0.19
iable
0.18
iative
0.16
iado
0.16
ble
0.15
ible
0.15
æĹıèĩªæ²»
0.15
INDER
0.15
Nice
0.14
Activations Density 0.017%