INDEX
Explanations
phrases related to the concept of belonging or affiliation
New Auto-Interp
Head Attr Weights
0:0.02
1:0.06
2:0.20
3:0.30
4:0.01
5:0.02
6:0.10
7:0.06
8:0.03
9:0.03
10:0.09
11:0.03
Negative Logits
essim
-1.09
fragrance
-1.08
ascript
-1.04
Restaur
-1.03
nutrition
-1.03
places
-1.00
crashes
-0.99
physi
-0.98
yrinth
-0.98
rapes
-0.98
POSITIVE LOGITS
worldly
1.28
Century
1.24
aldo
1.15
']
1.14
agine
1.12
.]
1.08
itude
1.07
alan
1.06
hing
1.05
.}
1.04
Activations Density 0.004%