INDEX
Explanations
dialogue and conversational exchanges
New Auto-Interp
Negative Logits
yg
-0.16
gro
-0.15
kud
-0.14
idual
-0.14
especially
-0.14
-thumb
-0.14
ppl
-0.13
ãģıãĤīãģĦ
-0.13
ossa
-0.13
aggio
-0.13
POSITIVE LOGITS
sir
0.28
Mr
0.22
Sir
0.20
Mr
0.20
åħĪçĶŁ
0.20
Witness
0.19
Sir
0.18
æĤ¨çļĦ
0.16
Exhib
0.16
mr
0.16
Activations Density 0.019%