INDEX
Explanations
concepts related to pride and self-expression
New Auto-Interp
Negative Logits
gaard
-0.15
ifar
-0.15
geois
-0.14
rei
-0.14
Ðİ
-0.14
ENCY
-0.14
PLL
-0.14
yourselves
-0.14
onth
-0.14
swith
-0.13
POSITIVE LOGITS
his
0.38
its
0.37
sua
0.36
seus
0.34
their
0.34
suas
0.33
seu
0.30
suoi
0.30
suo
0.29
jego
0.28
Activations Density 0.469%