INDEX
Explanations
terms that denote rarity or commonality
New Auto-Interp
Head Attr Weights
0:0.18
1:0.08
2:0.13
3:0.08
4:0.02
5:0.06
6:0.03
7:0.01
8:0.16
9:0.06
10:0.04
11:0.09
Negative Logits
visas
-1.55
mate
-1.51
vant
-1.42
balances
-1.39
accompanied
-1.38
Coat
-1.37
bearer
-1.36
onom
-1.36
detail
-1.35
scales
-1.35
POSITIVE LOGITS
�
1.80
�
1.74
�士
1.69
�
1.63
�
1.61
�
1.57
�
1.54
��
1.53
�
1.52
Daryl
1.48
Activations Density 0.001%