INDEX
Explanations
words related to telling lies or being deceitful
instances of the word "lying" to identify discussions of dishonesty or deception
New Auto-Interp
Negative Logits
Ultra
-0.77
obs
-0.76
ains
-0.76
aldi
-0.76
ugal
-0.75
iles
-0.73
ISO
-0.72
ilation
-0.71
arthy
-0.69
FN
-0.69
POSITIVE LOGITS
utenant
0.79
vulner
0.79
dormant
0.79
sembly
0.74
lie
0.74
detector
0.73
uten
0.73
awake
0.71
lying
0.70
skelet
0.69
Activations Density 0.012%