INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.07
1:0.07
2:0.08
3:0.08
4:0.07
5:0.09
6:0.08
7:0.08
8:0.08
9:0.07
10:0.09
11:0.08
Negative Logits
sein
-2.94
archive
-2.93
azes
-2.93
agos
-2.86
rosso
-2.76
aspers
-2.75
angelo
-2.73
-----
-2.65
conqu
-2.64
defense
-2.61
POSITIVE LOGITS
Jaw
2.97
Clicker
2.97
Lamb
2.96
Jinn
2.95
Kw
2.88
Mulcair
2.78
~~~~
2.78
Drone
2.76
NL
2.72
Osw
2.67
Activations Density 0.000%
No Known Activations
This feature has no known activations.