INDEX
Explanations
expressions of gratitude or appreciation
expressions of gratitude or thanks
New Auto-Interp
Negative Logits
ulhu
-0.71
Dare
-0.70
ahime
-0.67
pos
-0.67
ther
-0.65
wide
-0.63
scape
-0.63
dq
-0.63
spir
-0.62
lo
-0.62
POSITIVE LOGITS
thanking
1.00
thanked
0.97
thank
0.77
applause
0.77
imaru
0.77
him
0.75
forgiveness
0.75
sarcast
0.73
congratulated
0.73
ESCO
0.72
Activations Density 0.017%