INDEX
Explanations
instances of gratitude and appreciation
New Auto-Interp
Negative Logits
arp
-0.15
lus
-0.15
obi
-0.15
arna
-0.15
пов
-0.15
лÑı
-0.14
avic
-0.14
ç¿Ķ
-0.13
ings
-0.13
.console
-0.13
POSITIVE LOGITS
ably
0.25
iable
0.18
ately
0.16
iado
0.15
ble
0.15
ible
0.15
appreciate
0.15
full
0.15
ived
0.15
fully
0.14
Activations Density 0.017%