INDEX
Explanations
instances of the word "why."
New Auto-Interp
Head Attr Weights
0:0.03
1:0.03
2:0.09
3:0.15
4:0.18
5:0.02
6:0.05
7:0.04
8:0.06
9:0.05
10:0.08
11:0.17
Negative Logits
routing
-1.57
ulence
-1.54
ocalyptic
-1.49
routed
-1.48
oliath
-1.45
ernaut
-1.44
acy
-1.43
menace
-1.42
accessibility
-1.41
unbeliev
-1.38
POSITIVE LOGITS
pmwiki
1.64
Fey
1.59
ⓘ
1.54
moot
1.50
Film
1.48
forth
1.47
yon
1.46
Lab
1.41
Cum
1.35
girls
1.35
Activations Density 0.002%