INDEX
Explanations
instances of the word "honor" in various forms
New Auto-Interp
Negative Logits
ors
-0.30
or
-0.17
ionate
-0.17
tas
-0.16
Pes
-0.15
tul
-0.15
odo
-0.15
ODO
-0.15
tan
-0.15
t
-0.14
POSITIVE LOGITS
cho
0.22
chos
0.19
ed
0.19
kins
0.18
ester
0.17
edin
0.17
zos
0.17
olulu
0.17
TRL
0.16
obo
0.16
Activations Density 0.006%