INDEX
Explanations
expressions of personal feelings and emotional responses
New Auto-Interp
Negative Logits
Flem
-0.17
-FIRST
-0.15
toHave
-0.15
isay
-0.14
ieg
-0.14
oice
-0.14
вок
-0.14
avou
-0.14
令
-0.14
igor
-0.14
POSITIVE LOGITS
feel
0.23
aware
0.18
happen
0.17
laugh
0.17
available
0.16
/us
0.16
216
0.16
appear
0.15
gig
0.15
feel
0.15
Activations Density 0.033%