INDEX
Explanations
references to human needs and societal issues
New Auto-Interp
Head Attr Weights
0:0.02
1:0.06
2:0.16
3:0.02
4:0.02
5:0.07
6:0.13
7:0.07
8:0.05
9:0.04
10:0.05
11:0.23
Negative Logits
classmate
-1.50
Sweeney
-1.38
teammate
-1.37
Hanson
-1.34
Scorp
-1.34
bull
-1.32
Irving
-1.28
backdrop
-1.27
Chair
-1.26
laureate
-1.25
POSITIVE LOGITS
pees
1.81
therap
1.59
seiz
1.55
condem
1.54
Sov
1.52
inished
1.50
WAYS
1.40
perished
1.39
answ
1.38
insured
1.34
Activations Density 0.011%