INDEX
Explanations
references to sentiment around shared interests and likability
New Auto-Interp
Negative Logits
like
-0.18
cape
-0.15
ogy
-0.15
शन
-0.15
ált
-0.14
roe
-0.14
locate
-0.14
ÙĬج
-0.14
dy
-0.14
linux
-0.14
POSITIVE LOGITS
minded
0.39
-minded
0.39
Minds
0.26
WISE
0.26
able
0.25
minds
0.25
hood
0.22
ability
0.21
ewise
0.20
inded
0.20
Activations Density 0.033%