INDEX
Explanations
websites or social media handles
references to online social media activity and interactions
New Auto-Interp
Negative Logits
INAL
-0.60
der
-0.51
EMBER
-0.51
ECA
-0.51
ratulations
-0.49
iety
-0.48
RET
-0.47
OUT
-0.46
Reviewed
-0.46
lycer
-0.46
POSITIVE LOGITS
ingle
0.60
schild
0.57
k
0.57
hern
0.54
etsk
0.53
kov
0.50
ased
0.49
tsky
0.48
hov
0.48
chens
0.47
Activations Density 0.176%