INDEX
Explanations
phrases indicating a clarification or further explanation
phrases that express personal opinions or invitations to listen
New Auto-Interp
Negative Logits
ibaba
-0.81
accomp
-0.69
ombs
-0.69
veter
-0.68
ibrary
-0.67
axter
-0.67
eatures
-0.66
unden
-0.65
affiliated
-0.63
ioch
-0.63
POSITIVE LOGITS
lees
0.77
zzo
0.77
eeee
0.72
thou
0.72
xff
0.69
dear
0.67
yah
0.65
Mister
0.65
sir
0.65
bro
0.64
Activations Density 0.094%