INDEX
Explanations
questions directed at the audience
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.08
3:0.06
4:0.07
5:0.03
6:0.04
7:0.39
8:0.04
9:0.04
10:0.07
11:0.11
Negative Logits
ante
-1.71
encers
-1.53
akery
-1.51
cases
-1.49
bery
-1.49
vity
-1.48
internet
-1.46
pools
-1.45
ema
-1.44
iversity
-1.42
POSITIVE LOGITS
favorably
1.79
outcome
1.62
constitu
1.62
pros
1.57
aloud
1.56
sugg
1.52
proposition
1.50
externalToEVAOnly
1.43
Rated
1.43
Zup
1.40
Activations Density 0.005%