INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.09
1:0.08
2:0.08
3:0.07
4:0.07
5:0.07
6:0.08
7:0.10
8:0.08
9:0.06
10:0.08
11:0.09
Negative Logits
adjust
-2.57
scramble
-2.50
mathemat
-2.46
reciproc
-2.40
POL
-2.39
Zurich
-2.38
Sunny
-2.26
contag
-2.25
Lew
-2.21
Irwin
-2.20
POSITIVE LOGITS
album
2.77
aga
2.58
Manga
2.58
tg
2.51
atto
2.37
emo
2.37
emy
2.36
ector
2.28
hoe
2.28
ega
2.26
Activations Density 0.000%
No Known Activations
This feature has no known activations.