INDEX
Explanations
references to personal development and social responsibility
New Auto-Interp
Negative Logits
guy
-0.17
vek
-0.16
ile
-0.16
guys
-0.15
ond
-0.15
anc
-0.15
fav
-0.15
nat
-0.14
hips
-0.14
zie
-0.14
POSITIVE LOGITS
citizens
0.29
citizen
0.27
Cit
0.27
Citizens
0.26
contributing
0.26
cit
0.25
Citizen
0.25
responsible
0.24
citiz
0.23
productive
0.23
Activations Density 0.155%