INDEX
Explanations
themes related to identity and social belonging
New Auto-Interp
Negative Logits
FLAG
-0.16
flagged
-0.14
ih
-0.14
discrim
-0.14
mitt
-0.14
hiba
-0.14
mineral
-0.14
rootReducer
-0.13
omez
-0.13
Flag
-0.13
POSITIVE LOGITS
peer
0.25
Peer
0.22
peer
0.22
Peer
0.21
-peer
0.21
Pressure
0.19
Pressure
0.18
herd
0.18
conformity
0.18
pressure
0.18
Activations Density 0.123%