INDEX
Explanations
links or URLs related to websites
New Auto-Interp
Head Attr Weights
0:0.15
1:0.18
2:0.04
3:0.06
4:0.02
5:0.13
6:0.05
7:0.03
8:0.05
9:0.10
10:0.07
11:0.08
Negative Logits
TAMADRA
-1.55
Parables
-1.42
Pigs
-1.42
Thrones
-1.40
merits
-1.40
faces
-1.40
Isles
-1.39
BALL
-1.39
sided
-1.37
Bucks
-1.35
POSITIVE LOGITS
emp
2.07
export
1.83
open
1.81
reference
1.80
free
1.79
tf
1.78
git
1.72
unc
1.70
syn
1.70
average
1.69
Activations Density 0.003%