INDEX
Explanations
expressions of gratitude
New Auto-Interp
Negative Logits
iche
-0.68
conserv
-0.68
女
-0.63
projected
-0.62
projecting
-0.62
place
-0.61
atform
-0.61
inese
-0.60
inance
-0.55
avis
-0.55
POSITIVE LOGITS
giving
1.39
goodness
0.89
SG
0.89
ESCO
0.88
heavens
0.88
gracious
0.86
rats
0.82
gements
0.81
god
0.79
acknowled
0.79
Activations Density 0.597%