INDEX
Explanations
terms related to social concepts and their underlying structures
New Auto-Interp
Negative Logits
anoia
-0.18
ãĤ£
-0.17
ancode
-0.16
iki
-0.15
utow
-0.15
apolis
-0.15
edList
-0.14
eday
-0.14
bih
-0.14
asto
-0.14
POSITIVE LOGITS
superf
0.14
nes
0.14
oles
0.14
Ud
0.13
oren
0.13
_foreign
0.13
olio
0.13
_MPI
0.13
investig
0.13
otten
0.13
Activations Density 0.005%