INDEX
Explanations
instances of the word "lie" and its variations, indicating themes of deception or falsehood
New Auto-Interp
Negative Logits
Jacobsen
-0.80
McCl
-0.78
AndEndTag
-0.76
ėl
-0.74
рс
-0.73
بالإنجليزية
-0.73
σσ
-0.72
agoza
-0.71
urator
-0.71
Medford
-0.70
POSITIVE LOGITS
Lie
1.11
lie
1.10
Lie
1.02
LIE
0.98
Lies
0.96
Lies
0.94
lying
0.93
lying
0.89
Lying
0.89
lies
0.89
Activations Density 0.088%