INDEX
Explanations
references to deceit or dishonesty
New Auto-Interp
Negative Logits
оригіналу
-0.74
faſt
-0.70
eſt
-0.68
Jefus
-0.67
houſe
-0.67
uſe
-0.67
updates
-0.67
pleaf
-0.65
uſ
-0.63
ſp
-0.63
POSITIVE LOGITS
lie
1.35
lies
1.15
LIE
0.85
liegen
0.83
Lie
0.83
lied
0.81
lying
0.80
windowFixed
0.80
laid
0.80
Lie
0.79
Activations Density 0.104%