INDEX
Explanations
mentions of cheating or deceptive behavior
words related to dishonest behavior, particularly cheating
New Auto-Interp
Negative Logits
oran
-0.70
oan
-0.67
Archdemon
-0.66
DragonMagazine
-0.63
Pain
-0.62
istan
-0.62
RIS
-0.60
Oper
-0.60
affer
-0.59
english
-0.58
POSITIVE LOGITS
cheat
0.91
cheating
0.83
ulative
0.82
herer
0.79
cheated
0.76
cheat
0.73
yre
0.69
chet
0.67
sheet
0.66
Bastard
0.64
Activations Density 0.012%