INDEX
Explanations
phrases indicating information or instructions given to someone
New Auto-Interp
Negative Logits
ILCS
-0.67
vous
-0.64
prus
-0.63
sidx
-0.62
aho
-0.62
adesh
-0.62
adish
-0.59
anu
-0.59
ockets
-0.59
aird
-0.59
POSITIVE LOGITS
by
1.22
beforehand
0.96
orally
0.93
repeatedly
0.84
by
0.81
aloud
0.77
tale
0.77
aback
0.76
BY
0.75
verbally
0.74
Activations Density 0.037%