INDEX
Explanations
politically charged language and references to authority figures or governmental actions
New Auto-Interp
Negative Logits
Aiheesta
-0.47
RegressionTest
-0.44
care
-0.43
lieb
-0.43
caring
-0.42
ORE
-0.41
ENBERG
-0.40
龚
-0.40
зала
-0.39
ÍTULO
-0.39
POSITIVE LOGITS
自分も
1.17
myself
1.02
僕も
0.96
yours
0.94
own
0.93
myself
0.83
Erreferentziak
0.83
ourselves
0.81
yourself
0.78
私も
0.78
Activations Density 0.335%