INDEX
Explanations
phrases emphasizing high recognition or cultural significance
New Auto-Interp
Negative Logits
isay
-0.15
trys
-0.15
cola
-0.15
apter
-0.14
oppins
-0.14
ults
-0.14
iferay
-0.14
oningen
-0.14
odesk
-0.14
okit
-0.14
POSITIVE LOGITS
talked
0.28
-talk
0.24
loved
0.22
anticipated
0.21
visited
0.20
discussed
0.20
followed
0.20
well
0.19
buzz
0.19
-ce
0.19
Activations Density 0.054%