INDEX
Explanations
terms related to identity, social structure, and individualism
New Auto-Interp
Negative Logits
loff
-0.16
uces
-0.14
ìĨĶ
-0.14
parable
-0.14
ificates
-0.14
ographed
-0.14
าà¹Ģล
-0.14
lys
-0.13
ej
-0.13
onaut
-0.13
POSITIVE LOGITS
ness
0.46
ity
0.44
ism
0.36
ality
0.33
fulness
0.29
iness
0.28
itude
0.28
NESS
0.28
hood
0.28
ification
0.27
Activations Density 0.213%