INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.09
    1:0.07
    2:0.08
    3:0.07
    4:0.08
    5:0.08
    6:0.08
    7:0.08
    8:0.08
    9:0.08
    10:0.09
    11:0.08
    Negative Logits
     Sailor
    -2.68
     Reilly
    -2.65
     cancelled
    -2.62
     showc
    -2.60
     delinquent
    -2.49
     stranded
    -2.49
     Tolkien
    -2.45
     Stevenson
    -2.44
     fixture
    -2.42
     haunt
    -2.38
    POSITIVE LOGITS
     architectures
    2.83
    2.82
    2.79
    2.73
     praising
    2.72
    ��
    2.70
    2.69
    2.67
    Deploy
    2.62
    2.59
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.