INDEX
Explanations
references to specific names or titles
New Auto-Interp
Negative Logits
ORGE
-0.75
steroids
-0.60
TYPE
-0.59
PsyNetMessage
-0.58
listed
-0.57
removable
-0.57
desk
-0.57
steroid
-0.56
AAP
-0.56
DonaldTrump
-0.56
POSITIVE LOGITS
mire
1.09
achev
1.04
atche
0.95
asus
0.95
atell
0.90
achu
0.89
sburgh
0.87
lia
0.86
rint
0.84
atoon
0.83
Activations Density 0.007%