INDEX
Explanations
terms and phrases related to deception or falsification
New Auto-Interp
Negative Logits
pour
-0.91
onen
-0.89
backer
-0.73
esa
-0.71
itsch
-0.69
aird
-0.69
guiActiveUnfocused
-0.68
cloth
-0.67
winner
-0.66
seek
-0.66
POSITIVE LOGITS
innocence
0.99
pas
0.84
phony
0.73
identity
0.73
ignorance
0.73
ewitness
0.72
identities
0.72
deception
0.72
faked
0.71
Invasion
0.71
Activations Density 0.021%