INDEX
Explanations
expressions of appreciation or gratitude towards others
expressions of gratitude and thankfulness
New Auto-Interp
Negative Logits
ģĸ
-0.74
xtap
-0.66
depending
-0.64
depending
-0.60
oiler
-0.59
blaming
-0.58
defaults
-0.58
ocide
-0.58
ģ«
-0.57
losers
-0.57
POSITIVE LOGITS
courage
0.79
tirelessly
0.78
responsibly
0.75
!]
0.75
brave
0.74
generously
0.74
:)
0.73
yesterday
0.68
courageous
0.67
peacefully
0.67
Activations Density 0.181%