INDEX
Explanations
words related to emotions and feelings
emotional expressions related to loss, privilege, and gratitude
New Auto-Interp
Negative Logits
arsen
-0.74
apons
-0.71
habitable
-0.70
predicate
-0.68
prev
-0.68
heat
-0.67
Pred
-0.66
ensibly
-0.66
plaus
-0.66
pill
-0.65
POSITIVE LOGITS
thanking
0.95
gratitude
0.91
condolences
0.90
THANK
0.85
!!!!!
0.83
saddened
0.81
thank
0.79
congratulations
0.78
congr
0.77
Thank
0.75
Activations Density 0.270%