INDEX
Explanations
phrases expressing gratitude or acknowledgment of a positive contribution
expressions of gratitude or acknowledgment
New Auto-Interp
Negative Logits
uve
-0.64
atform
-0.63
mania
-0.58
desired
-0.57
ength
-0.57
verages
-0.57
riage
-0.56
etus
-0.54
oos
-0.54
]'
-0.52
POSITIVE LOGITS
giving
1.17
to
0.97
largely
0.86
partly
0.78
mainly
0.75
chiefly
0.74
primarily
0.70
thereto
0.68
mostly
0.67
solely
0.66
Activations Density 0.023%