INDEX
Explanations
references to individualism and personal identity
New Auto-Interp
Negative Logits
舺
-0.46
freue
-0.41
oredCriteria
-0.40
enschein
-0.39
úrese
-0.39
hmmmm
-0.38
Begriffsklä
-0.38
few
-0.37
Française
-0.36
новниш
-0.36
POSITIVE LOGITS
personal
0.65
hoeddwyd
0.57
personal
0.57
individual
0.56
individuale
0.53
Personal
0.52
PyExc
0.52
setVerticalGroup
0.51
Personal
0.50
individually
0.50
Activations Density 0.299%