INDEX
Explanations
phrases expressing gratitude or appreciation
instances where gratitude or acknowledgment is expressed
New Auto-Interp
Negative Logits
atform
-0.67
uve
-0.63
女
-0.62
ength
-0.62
avis
-0.60
ille
-0.58
sci
-0.57
âĶĢâĶĢ
-0.57
]'
-0.57
abulary
-0.55
POSITIVE LOGITS
giving
1.25
largely
0.82
to
0.71
ulously
0.71
partly
0.70
mainly
0.69
roxy
0.68
raltar
0.66
cription
0.66
thankfully
0.65
Activations Density 0.024%