INDEX
Explanations
Twitter usernames
usernames or handles from social media
New Auto-Interp
Negative Logits
ãĢIJ
-0.63
bruising
-0.61
margins
-0.61
guiName
-0.59
snowball
-0.59
Curiosity
-0.57
ACTIONS
-0.57
whims
-0.56
,"
-0.56
".
-0.56
POSITIVE LOGITS
)
1.19
/)
1.15
)!
1.10
)...
1.09
),
1.02
)-
1.02
%)
0.98
>)
0.96
!)
0.96
).
0.93
Activations Density 0.101%