INDEX
Explanations
words related to deception or withholding information
words related to the concept of "dudes."
New Auto-Interp
Negative Logits
sustained
-0.65
stall
-0.65
flap
-0.65
hungry
-0.64
hill
-0.63
chest
-0.63
PAT
-0.63
scra
-0.62
Atkinson
-0.61
local
-0.61
POSITIVE LOGITS
udes
4.70
ude
3.24
uding
2.17
uded
1.94
usions
1.62
usion
1.39
uders
1.27
usive
1.20
uder
1.15
ud
1.05
Activations Density 0.010%