INDEX
Explanations
references to studies and research findings related to significant societal issues
New Auto-Interp
Head Attr Weights
0:0.08
1:0.01
2:0.08
3:0.42
4:0.04
5:0.09
6:0.03
7:0.04
8:0.06
9:0.02
10:0.05
11:0.03
Negative Logits
selfies
-2.23
attire
-2.22
holidays
-2.11
badge
-2.10
downtime
-2.09
holiday
-2.08
merchandise
-2.07
curfew
-2.07
festivities
-2.06
glory
-2.05
POSITIVE LOGITS
researcher
4.05
earcher
3.87
economist
3.65
psychologist
3.64
biologist
3.55
researchers
3.40
Researchers
3.32
theorist
3.25
Economist
3.25
reviewer
3.24
Activations Density 0.760%