INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.07
2:0.07
3:0.08
4:0.08
5:0.08
6:0.08
7:0.06
8:0.09
9:0.09
10:0.08
11:0.08
Negative Logits
Sov
-1.44
typew
-1.38
Sally
-1.31
inacc
-1.26
utra
-1.25
rehe
-1.23
Downloadha
-1.22
Wikipedia
-1.19
Repe
-1.19
clerks
-1.16
POSITIVE LOGITS
oor
1.31
kind
1.30
strate
1.30
dign
1.30
viron
1.29
stars
1.28
acly
1.28
inge
1.28
zzy
1.27
ulent
1.27
Activations Density 0.000%
No Known Activations
This feature has no known activations.