INDEX
Explanations
words related to traps or mechanisms designed to capture or deceive something
references to traps, both literal and metaphorical
New Auto-Interp
Negative Logits
edly
-0.87
issance
-0.84
sis
-0.82
verty
-0.70
played
-0.69
care
-0.69
sburg
-0.69
eda
-0.65
doms
-0.64
shire
-0.63
POSITIVE LOGITS
door
1.36
doors
1.22
idon
0.85
finding
0.81
Hole
0.76
traps
0.72
trap
0.69
trap
0.67
hered
0.67
resses
0.67
Activations Density 0.037%