INDEX
Explanations
expressions of gratitude
expressions of gratitude and appreciation
New Auto-Interp
Negative Logits
agate
-0.73
oto
-0.67
improve
-0.65
FO
-0.62
etc
-0.62
calling
-0.62
change
-0.61
analy
-0.60
uter
-0.59
smoking
-0.59
POSITIVE LOGITS
giving
0.97
acknowled
0.96
gements
0.87
citiz
0.86
pardon
0.83
FUL
0.79
acknowledgment
0.78
fulness
0.77
gratitude
0.77
ledged
0.77
Activations Density 0.017%