INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.07
1:0.04
2:0.09
3:0.08
4:0.09
5:0.08
6:0.08
7:0.08
8:0.09
9:0.08
10:0.09
11:0.08
Negative Logits
beauty
-1.78
scoreboard
-1.78
vanity
-1.77
bland
-1.75
hindsight
-1.70
disguise
-1.69
simplicity
-1.68
ouston
-1.64
mah
-1.62
tru
-1.60
POSITIVE LOGITS
atonin
1.95
osponsors
1.86
acteria
1.85
directed
1.85
acter
1.82
ordering
1.72
Transfer
1.71
RNA
1.67
osuke
1.66
acterial
1.63
Activations Density 0.000%
No Known Activations
This feature has no known activations.