INDEX
Explanations
instances where someone is pretending or falsifying something
instances of the word "pretend" and its variations
New Auto-Interp
Negative Logits
KO
-0.69
Lith
-0.66
impacting
-0.63
iang
-0.63
UC
-0.63
oly
-0.62
âĨij
-0.62
Overall
-0.62
Region
-0.62
Tank
-0.61
POSITIVE LOGITS
pretend
3.46
pretended
2.66
pretending
2.59
disguise
1.54
pret
1.47
pret
1.37
Pret
1.36
disgu
1.30
phony
1.26
fictitious
1.25
Activations Density 0.022%