INDEX
Explanations
references to animals and their interactions with humans
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.24
3:0.18
4:0.06
5:0.04
6:0.05
7:0.03
8:0.05
9:0.09
10:0.10
11:0.05
Negative Logits
beforehand
-1.43
intervening
-1.32
Agility
-1.32
ADRA
-1.29
ausp
-1.27
Guinness
-1.26
ADA
-1.25
therein
-1.25
Vern
-1.24
Acknowled
-1.22
POSITIVE LOGITS
perse
1.67
widget
1.64
agate
1.63
ugu
1.60
"]=>
1.57
effects
1.46
flush
1.42
ui
1.42
ovie
1.42
=(
1.40
Activations Density 0.012%