INDEX
Explanations
multi-word phrases that indicate structured or significant ideas
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.06
3:0.04
4:0.07
5:0.02
6:0.17
7:0.40
8:0.03
9:0.03
10:0.06
11:0.07
Negative Logits
typo
-1.46
yrics
-1.45
igmatic
-1.45
Fs
-1.40
aska
-1.38
imei
-1.36
sqor
-1.34
furt
-1.33
esan
-1.27
ensical
-1.26
POSITIVE LOGITS
ceilings
1.80
horizont
1.79
clus
1.66
lineback
1.58
rity
1.57
antioxid
1.56
pillars
1.54
helic
1.54
horm
1.49
pillar
1.44
Activations Density 0.001%