INDEX
Explanations
phrases containing the word "by" indicating attribution or authorship
New Auto-Interp
Negative Logits
isms
-0.81
hops
-0.77
idences
-0.77
worthiness
-0.73
database
-0.73
zone
-0.72
leeve
-0.72
uality
-0.70
bra
-0.70
olate
-0.69
POSITIVE LOGITS
Ay
0.92
Todd
0.91
Christine
0.88
Stephen
0.87
Ellen
0.86
Richard
0.85
Tom
0.85
Yosh
0.83
former
0.83
Shinzo
0.83
Activations Density 0.032%