INDEX
Explanations
terms related to social and emotional issues
New Auto-Interp
Head Attr Weights
0:0.07
1:0.03
2:0.02
3:0.09
4:0.41
5:0.06
6:0.04
7:0.02
8:0.06
9:0.10
10:0.02
11:0.02
Negative Logits
emphasis
-2.84
JP
-2.63
OPLE
-2.47
Figure
-2.44
Badge
-2.42
:[
-2.36
Mund
-2.34
Tee
-2.31
cu
-2.27
Quote
-2.25
POSITIVE LOGITS
istical
2.41
kefeller
2.38
iera
2.38
ageddon
2.22
ufact
2.19
owl
2.17
estine
2.17
icho
2.14
leground
2.14
urned
2.11
Activations Density 0.013%