INDEX
Explanations
phrases attributing authorship or responsibility
New Auto-Interp
Negative Logits
co
-0.49
sa
-0.47
all
-0.44
sent
-0.44
<eos>
-0.44
her
-0.42
ոյ
-0.41
indu
-0.41
k
-0.41
no
-0.40
POSITIVE LOGITS
Monfieur
1.05
Efq
0.98
itſelf
0.98
Jefus
0.94
Reſ
0.94
UnusedPrivate
0.92
Theſe
0.92
WebElementEntity
0.92
doubtnut
0.91
ſche
0.90
Activations Density 0.000%