INDEX
Explanations
words associated with falling into specific categories or traps
phrases related to falling into categories or traps
New Auto-Interp
Negative Logits
die
-0.70
indu
-0.70
cies
-0.66
don
-0.65
boys
-0.65
killer
-0.63
be
-0.62
banished
-0.62
yip
-0.62
warning
-0.61
POSITIVE LOGITS
izoph
0.79
adulthood
0.75
qus
0.74
ibilities
0.73
Recession
0.71
âĸĴ
0.70
Role
0.70
obs
0.69
accordance
0.69
clusion
0.69
Activations Density 0.049%