INDEX
Explanations
phrases expressing identity and belonging
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.18
3:0.07
4:0.11
5:0.03
6:0.09
7:0.07
8:0.10
9:0.04
10:0.11
11:0.11
Negative Logits
opened
-1.78
activated
-1.71
urned
-1.62
opped
-1.61
pressed
-1.55
reviewed
-1.53
Sparks
-1.47
onest
-1.45
CPC
-1.44
Downs
-1.44
POSITIVE LOGITS
BILITY
1.85
ahime
1.77
iasm
1.66
Appearance
1.62
...]
1.59
infer
1.55
unemploy
1.55
GW
1.53
ADS
1.52
ModLoader
1.52
Activations Density 0.001%