INDEX
Explanations
instances of the word "honest"
instances of the word "honest."
New Auto-Interp
Negative Logits
LAN
-0.86
CHAT
-0.73
Krish
-0.72
617
-0.72
515
-0.69
lav
-0.69
acid
-0.69
Libraries
-0.68
avia
-0.67
KEN
-0.67
POSITIVE LOGITS
honest
0.91
honesty
0.85
broker
0.82
truthful
0.73
princ
0.73
urance
0.71
sounding
0.70
lly
0.70
frank
0.70
Honest
0.68
Activations Density 0.015%