INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.08
2:0.08
3:0.07
4:0.09
5:0.08
6:0.08
7:0.08
8:0.08
9:0.07
10:0.07
11:0.08
Negative Logits
Colleg
-2.71
anton
-2.52
Dru
-2.50
cies
-2.42
arat
-2.39
akespeare
-2.35
conom
-2.31
inen
-2.29
iership
-2.29
Bru
-2.29
POSITIVE LOGITS
!--
2.99
!/
2.94
:(
2.88
-->
2.86
eks
2.66
)--
2.52
KH
2.51
=/
2.49
++++
2.45
=>
2.45
Activations Density 0.000%
No Known Activations
This feature has no known activations.