INDEX
Explanations
references to actions, entities, and elements indicative of social interactions and relationships
New Auto-Interp
Negative Logits
departure
-0.14
uge
-0.14
925
-0.13
ssf
-0.13
fond
-0.13
emm
-0.13
departamento
-0.13
xương
-0.13
Depart
-0.13
((!
-0.12
POSITIVE LOGITS
raquo
0.17
nin
0.16
agged
0.15
ingo
0.15
adin
0.15
ustos
0.15
reak
0.14
lington
0.14
ayo
0.14
olumn
0.14
Activations Density 0.089%