INDEX
Explanations
references to lying or deceitful behavior
New Auto-Interp
Negative Logits
Décès
-0.69
рады
-0.66
Allentown
-0.63
évaluateur
-0.63
Câmara
-0.62
hadiran
-0.61
McIn
-0.60
antemano
-0.59
{\-0.59
Maynard
-0.59
POSITIVE LOGITS
lie
2.43
lies
2.21
lying
2.06
LIE
2.00
Lie
1.93
Lies
1.82
Lying
1.80
Lies
1.79
Lying
1.74
Lie
1.70
Activations Density 0.070%