INDEX
Explanations
terms related to individualism and personal identity
New Auto-Interp
Negative Logits
/he
-0.17
upy
-0.16
§
-0.16
enia
-0.16
fait
-0.15
atology
-0.15
á»ķ
-0.15
own
-0.15
lix
-0.14
_FLOW
-0.14
POSITIVE LOGITS
ized
0.37
ised
0.32
istic
0.29
ity
0.27
/group
0.25
IZED
0.25
/team
0.23
ize
0.22
swith
0.20
izado
0.20
Activations Density 0.025%