INDEX
Explanations
phrases expressing personal preferences and likes
New Auto-Interp
Negative Logits
acco
-0.18
line
-0.17
iste
-0.17
ista
-0.16
sel
-0.16
ils
-0.16
/Linux
-0.16
shed
-0.15
slightest
-0.15
tes
-0.15
POSITIVE LOGITS
-minded
0.25
/dis
0.24
able
0.21
minded
0.20
/lo
0.20
WISE
0.17
unto
0.17
latter
0.17
elihood
0.16
ably
0.16
Activations Density 0.080%