INDEX
Explanations
phrases predicting future outcomes or states
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.22
3:0.04
4:0.05
5:0.02
6:0.26
7:0.20
8:0.02
9:0.03
10:0.04
11:0.05
Negative Logits
erity
-1.52
Vers
-1.48
president
-1.37
Bet
-1.36
Pax
-1.36
fml
-1.34
ipolar
-1.33
acia
-1.32
ocity
-1.32
ghazi
-1.31
POSITIVE LOGITS
mma
1.68
lishes
1.53
Tatt
1.48
rated
1.40
Mean
1.35
Venezuel
1.34
define
1.34
clips
1.34
metic
1.27
Eternity
1.27
Activations Density 0.002%