INDEX
Explanations
references to the concept of individuality and individualism
New Auto-Interp
Negative Logits
bulan
-0.17
нд
-0.16
fully
-0.16
majority
-0.15
emet
-0.15
dden
-0.15
fulness
-0.15
گاÙĩ
-0.15
own
-0.14
ener
-0.14
POSITIVE LOGITS
ized
0.27
/team
0.22
/group
0.21
ity
0.21
itarian
0.20
istic
0.20
ize
0.20
ization
0.19
swith
0.18
izing
0.17
Activations Density 0.027%