INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.07
1:0.06
2:0.08
3:0.09
4:0.06
5:0.09
6:0.09
7:0.07
8:0.08
9:0.09
10:0.07
11:0.09
Negative Logits
constitu
-2.03
Flavoring
-1.92
Belief
-1.86
diapers
-1.80
thood
-1.70
afety
-1.63
rika
-1.60
othing
-1.59
behavior
-1.52
illusions
-1.51
POSITIVE LOGITS
srfAttach
2.01
archive
1.87
catentry
1.79
intendent
1.60
bidder
1.56
pivot
1.49
vet
1.46
threaten
1.44
langu
1.44
backdrop
1.42
Activations Density 0.000%
No Known Activations
This feature has no known activations.