INDEX
Explanations
expressions of personal identity and self-description related to gender and interests
New Auto-Interp
Negative Logits
/Peak
-0.14
ÑĦа
-0.14
.EventType
-0.14
\Blueprint
-0.13
munition
-0.13
лÑİб
-0.13
agged
-0.13
italic
-0.13
ÏģÏī
-0.13
çħ§
-0.13
POSITIVE LOGITS
Exposure
0.17
society
0.16
modern
0.16
ociety
0.15
surround
0.15
exposure
0.15
urban
0.15
ults
0.15
surrounded
0.15
ensitive
0.15
Activations Density 0.041%