INDEX
Explanations
instances of the letter 't'
New Auto-Interp
Head Attr Weights
0:0.04
1:0.02
2:0.29
3:0.07
4:0.06
5:0.06
6:0.10
7:0.02
8:0.09
9:0.05
10:0.05
11:0.10
Negative Logits
Loaded
-1.67
Kafka
-1.43
iery
-1.42
Hits
-1.42
Story
-1.38
idation
-1.32
ography
-1.28
HQ
-1.27
resa
-1.27
Ku
-1.26
POSITIVE LOGITS
orsi
1.86
owship
1.77
activate
1.73
etheless
1.72
rouse
1.72
viol
1.65
norm
1.55
recognize
1.51
luster
1.50
reon
1.49
Activations Density 0.053%