INDEX
Explanations
words related to deception and falsehood
New Auto-Interp
Negative Logits
orianCalendar
-0.56
iibo
-0.56
ALF
-0.52
bootstrapcdn
-0.52
iettes
-0.50
ویکیپدی
-0.48
HomeAsUpEnabled
-0.48
IFA
-0.47
FID
-0.46
ysuckle
-0.46
POSITIVE LOGITS
étoient
0.60
avoient
0.56
sindic
0.53
grecque
0.51
venezolano
0.51
coffret
0.50
advogado
0.50
femininas
0.50
lienzo
0.49
négo
0.49
Activations Density 1.278%