INDEX
Explanations
expressions of enjoyment or positive experiences related to activities or moments
New Auto-Interp
Negative Logits
older
-0.17
uras
-0.15
atte
-0.14
owing
-0.14
esar
-0.14
arin
-0.14
gere
-0.14
Ñģли
-0.14
enjoying
-0.14
enie
-0.14
POSITIVE LOGITS
ably
0.36
able
0.24
ment
0.24
ments
0.24
ables
0.20
erals
0.19
/dis
0.19
yourselves
0.17
watching
0.17
themselves
0.17
Activations Density 0.038%