INDEX
Explanations
expressions of gratitude
expressions of gratitude or appreciation
New Auto-Interp
Negative Logits
女
-0.87
uclear
-0.72
arist
-0.67
objects
-0.67
Osc
-0.65
åĬ
-0.62
spir
-0.62
projected
-0.61
ago
-0.61
iche
-0.61
POSITIVE LOGITS
giving
1.41
GOODMAN
0.80
haw
0.80
Guys
0.79
gements
0.78
Thanks
0.78
pardon
0.76
Credits
0.75
bye
0.74
brance
0.72
Activations Density 0.016%