INDEX
Explanations
terms associated with falsehoods and inaccuracies
falsehoods and deceptions
New Auto-Interp
Negative Logits
Consideration
-0.44
Aga
-0.43
Kita
-0.40
iVar
-0.40
Kita
-0.40
ườn
-0.39
ngang
-0.39
Kidd
-0.39
Hug
-0.39
ITA
-0.39
POSITIVE LOGITS
false
1.40
False
1.30
False
1.25
false
1.22
fausse
1.17
falsa
1.13
falsos
1.12
falso
1.09
falsas
1.08
FALSE
1.07
Activations Density 0.025%