INDEX
Explanations
concepts related to human behavior and social dynamics
New Auto-Interp
Negative Logits
informatics
-0.51
ET
-0.49
<<
-0.48
大力
-0.48
modernized
-0.47
==""){-0.46
ctomy
-0.46
exemplary
-0.45
--}}
-0.44
Integrity
-0.43
POSITIVE LOGITS
human
1.48
humans
1.30
human
1.25
humana
1.24
humaine
1.24
menschliche
1.22
humain
1.17
humanas
1.16
humano
1.16
Humans
1.15
Activations Density 0.385%