INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.08
    2:0.08
    3:0.07
    4:0.09
    5:0.08
    6:0.08
    7:0.08
    8:0.08
    9:0.07
    10:0.07
    11:0.08
    Negative Logits
     Colleg
    -2.71
    anton
    -2.52
     Dru
    -2.50
    cies
    -2.42
    arat
    -2.39
    akespeare
    -2.35
    conom
    -2.31
    inen
    -2.29
    iership
    -2.29
     Bru
    -2.29
    POSITIVE LOGITS
    !--
    2.99
    !/
    2.94
    :(
    2.88
    -->
    2.86
    eks
    2.66
    )--
    2.52
     KH
    2.51
    =/
    2.49
    ++++
    2.45
    =>
    2.45
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.