INDEX
Explanations
expressions related to being happy or pleased about something
expressions of happiness related to the state of being
New Auto-Interp
Negative Logits
ongyang
-0.75
idation
-0.67
ritz
-0.66
rigs
-0.65
strain
-0.63
risis
-0.63
inference
-0.62
rones
-0.60
fray
-0.60
ortmund
-0.59
POSITIVE LOGITS
able
1.36
reminded
1.06
alive
0.98
reunited
0.98
honest
0.91
que
0.91
judged
0.89
treated
0.89
amed
0.88
reborn
0.88
Activations Density 0.105%