INDEX
Explanations
sentences that start with a period or contain punctuation
New Auto-Interp
Head Attr Weights
0:0.04
1:0.02
2:0.35
3:0.05
4:0.06
5:0.04
6:0.13
7:0.04
8:0.06
9:0.04
10:0.07
11:0.04
Negative Logits
levers
-1.76
corridors
-1.68
arettes
-1.65
streams
-1.63
presentations
-1.61
channels
-1.59
nels
-1.59
shenanigans
-1.54
nesses
-1.53
sheets
-1.51
POSITIVE LOGITS
llah
1.67
dfx
1.63
ía
1.59
ë
1.56
Quantity
1.53
feld
1.52
KNOWN
1.50
ENA
1.43
liv
1.41
的
1.41
Activations Density 0.006%