INDEX
Explanations
changes in perception and behavior regarding societal issues and individual actions
New Auto-Interp
Negative Logits
itself
-0.22
à¤īसन
-0.20
Ø®ÙĪØ¯Ø´
-0.20
yourself
-0.17
оно
-0.15
ï¼Įå®ĥ
-0.15
Ø¢ÙĨ
-0.15
ê·¸ëĬĶ
-0.15
nó
-0.14
kendisi
-0.14
POSITIVE LOGITS
their
1.44
their
1.27
Their
1.19
Their
1.18
THEIR
1.06
иÑħ
1.03
jejich
0.96
leur
0.95
leurs
0.94
loro
0.94
Activations Density 2.760%