INDEX
Explanations
repeated phrases, particularly definite articles and demonstratives in various contexts
New Auto-Interp
Head Attr Weights
0:0.41
1:0.02
2:0.01
3:0.10
4:0.03
5:0.10
6:0.03
7:0.03
8:0.14
9:0.04
10:0.01
11:0.03
Negative Logits
understatement
-1.66
erity
-1.64
Mane
-1.50
Argument
-1.49
Delay
-1.46
ionage
-1.45
atto
-1.44
Demand
-1.44
Honour
-1.43
Architects
-1.41
POSITIVE LOGITS
corners
2.24
corridors
2.05
yards
1.84
halls
1.82
necks
1.80
yards
1.79
valleys
1.75
corner
1.71
blocks
1.67
forums
1.67
Activations Density 0.009%