INDEX
Explanations
words related to traps
the concept of traps in various contexts
New Auto-Interp
Negative Logits
Merit
-0.71
edly
-0.70
Kardashian
-0.66
league
-0.66
issance
-0.63
played
-0.62
normalized
-0.61
Glory
-0.60
sis
-0.60
concerned
-0.60
POSITIVE LOGITS
door
1.35
trap
1.33
trap
1.23
traps
1.13
doors
1.08
Trap
1.06
Traps
0.99
trapping
0.87
idon
0.86
Hole
0.83
Activations Density 0.013%