INDEX
Explanations
phrases related to honesty and transparency
New Auto-Interp
Negative Logits
Ñĩи
-0.15
sea
-0.15
isz
-0.14
ابت
-0.14
reli
-0.14
stå
-0.14
اغ
-0.14
oola
-0.14
ISMATCH
-0.14
defgroup
-0.14
POSITIVE LOGITS
honest
0.27
honesty
0.23
candid
0.21
frank
0.19
honestly
0.17
Kauf
0.17
admit
0.16
ecta
0.15
ãĥ©ãĥ³ãĥī
0.15
Honest
0.15
Activations Density 0.082%