INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.08
    2:0.06
    3:0.09
    4:0.08
    5:0.08
    6:0.09
    7:0.09
    8:0.08
    9:0.07
    10:0.06
    11:0.08
    Negative Logits
     boundaries
    -2.00
     independence
    -1.73
     reperto
    -1.69
     mafia
    -1.64
     chores
    -1.61
    independence
    -1.56
     autonomy
    -1.54
     booze
    -1.52
    aeda
    -1.51
     differe
    -1.50
    POSITIVE LOGITS
     Detected
    1.71
    aniel
    1.67
    lich
    1.52
     embed
    1.49
     Reve
    1.42
     Shade
    1.39
     Tweet
    1.37
     Shop
    1.36
    York
    1.36
     unearthed
    1.35
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.