INDEX
Explanations
mentions of political figures and social media handles
usernames and mentions in social media context
New Auto-Interp
Negative Logits
).[
-0.76
".[
-0.73
''.
-0.72
�
-0.68
Orient
-0.66
."[
-0.63
Yug
-0.62
increment
-0.62
âĶĢâĶĢ
-0.62
autonom
-0.62
POSITIVE LOGITS
Jr
1.26
@
1.16
_
1.04
FB
1.02
congr
0.96
Stud
0.91
official
0.90
why
0.90
john
0.87
HQ
0.86
Activations Density 0.080%