INDEX
Explanations
phrases related to deception or manipulation of information
appearing convincing or impressive
New Auto-Interp
Negative Logits
bereitung
-0.35
jeunesse
-0.34
horabuena
-0.33
่าน
-0.33
Jahr
-0.33
dúvidas
-0.32
year
-0.32
titulaire
-0.31
illustration
-0.31
Knight
-0.31
POSITIVE LOGITS
ligiloj
0.59
EconPapers
0.56
AccessorTable
0.56
ftagPool
0.55
Fake
0.54
CppMethod
0.54
tonode
0.53
Fake
0.53
出版年
0.52
fake
0.50
Activations Density 0.032%