INDEX
Explanations
various forms of the word "honest" and related concepts
New Auto-Interp
Negative Logits
tery
-0.18
lassian
-0.17
hlen
-0.16
.scalablytyped
-0.16
sWith
-0.16
izi
-0.15
uteur
-0.15
ture
-0.14
sko
-0.14
rost
-0.14
POSITIVE LOGITS
Abe
0.22
-to
0.20
broker
0.20
/auth
0.19
brokers
0.18
ably
0.18
appraisal
0.17
/raw
0.17
bones
0.15
Broker
0.15
Activations Density 0.036%