INDEX
Explanations
expressions of personal opinions and feelings about social interactions
New Auto-Interp
Negative Logits
IB
-0.15
gart
-0.14
jes
-0.14
kö
-0.14
su
-0.14
jb
-0.14
Geb
-0.14
=""></
-0.14
Xxx
-0.14
ungi
-0.13
POSITIVE LOGITS
would
0.18
would
0.16
Would
0.16
ureau
0.16
Would
0.15
cus
0.15
ave
0.15
wouldn
0.15
象
0.14
bbe
0.14
Activations Density 0.138%