INDEX
Explanations
words associated with deception and isolation
New Auto-Interp
Negative Logits
PDATE
-0.74
theless
-0.73
Shack
-0.71
Dragonbound
-0.69
Doctrine
-0.66
Belt
-0.64
miscarriage
-0.64
Penet
-0.63
LORD
-0.62
Heller
-0.61
POSITIVE LOGITS
ations
1.94
ating
1.81
ates
1.73
ators
1.70
atory
1.56
ator
1.55
ational
1.51
ative
1.43
ated
1.43
ate
1.34
Activations Density 0.025%