INDEX
Explanations
topics related to social and political issues
expressions of dissatisfaction or complaints
New Auto-Interp
Negative Logits
estyles
-0.61
ãĤ´ãĥ³
-0.58
endum
-0.57
Timeline
-0.56
iece
-0.56
Canaver
-0.55
ãĥ©
-0.55
ãĤ»
-0.53
ished
-0.53
ubuntu
-0.53
POSITIVE LOGITS
they
1.19
THEY
1.16
their
1.15
They
1.14
theirs
1.12
THEIR
1.12
they
1.10
They
0.98
themselves
0.98
Their
0.94
Activations Density 1.317%