INDEX
Explanations
references to being thankful or showing gratitude
New Auto-Interp
Negative Logits
ihar
-0.78
ilib
-0.77
ires
-0.77
urden
-0.75
apest
-0.75
emies
-0.75
icut
-0.73
terness
-0.73
ardless
-0.72
ecycle
-0.72
POSITIVE LOGITS
generous
1.26
clever
1.03
generosity
1.01
diligent
1.01
ingenuity
0.99
advancements
0.96
donations
0.94
ingenious
0.94
persever
0.93
hindsight
0.93
Activations Density 0.241%