INDEX
Explanations
words related to appreciation, support, and positivity
positive attributes and expressions of gratitude
New Auto-Interp
Negative Logits
toggle
-0.73
guiActiveUn
-0.68
edit
-0.68
dod
-0.67
aded
-0.65
udo
-0.62
Presumably
-0.62
Worse
-0.62
rez
-0.62
fix
-0.61
POSITIVE LOGITS
compassionate
0.81
courageous
0.78
sacrific
0.78
philanthrop
0.76
stewards
0.73
multicultural
0.73
respectfully
0.72
comr
0.72
empowering
0.71
endeavors
0.71
Activations Density 1.783%