INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.08
2:0.06
3:0.09
4:0.08
5:0.08
6:0.09
7:0.09
8:0.08
9:0.07
10:0.06
11:0.08
Negative Logits
boundaries
-2.00
independence
-1.73
reperto
-1.69
mafia
-1.64
chores
-1.61
independence
-1.56
autonomy
-1.54
booze
-1.52
aeda
-1.51
differe
-1.50
POSITIVE LOGITS
Detected
1.71
aniel
1.67
lich
1.52
embed
1.49
Reve
1.42
Shade
1.39
Tweet
1.37
Shop
1.36
York
1.36
unearthed
1.35
Activations Density 0.000%
No Known Activations
This feature has no known activations.