INDEX
Explanations
instances of posts and submissions attributed to users
New Auto-Interp
Negative Logits
arer
-0.18
ilder
-0.16
erif
-0.16
773
-0.15
pedia
-0.15
Traits
-0.15
ÅĻ
-0.15
pÃŃs
-0.14
asu
-0.14
_Construct
-0.13
POSITIVE LOGITS
Score
0.15
Ven
0.15
ven
0.14
otope
0.14
Stern
0.14
onto
0.14
-score
0.14
odesk
0.14
Lâm
0.13
score
0.13
Activations Density 0.010%