INDEX
Explanations
references to dialogues or interviews involving individuals
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.07
3:0.11
4:0.14
5:0.02
6:0.03
7:0.13
8:0.02
9:0.03
10:0.12
11:0.24
Negative Logits
�
-1.47
reply
-1.45
Activity
-1.45
greeting
-1.40
entertained
-1.40
Talking
-1.37
realDonaldTrump
-1.34
reetings
-1.34
convers
-1.32
OTE
-1.31
POSITIVE LOGITS
rarity
1.55
Alb
1.43
nesses
1.35
PSP
1.32
phis
1.32
eria
1.30
penalties
1.28
Lump
1.27
Fract
1.26
Ratio
1.23
Activations Density 0.066%