INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.08
    2:0.08
    3:0.07
    4:0.08
    5:0.08
    6:0.08
    7:0.08
    8:0.08
    9:0.06
    10:0.09
    11:0.08
    Negative Logits
     Tit
    -1.61
     Irwin
    -1.48
     Tut
    -1.47
     Amos
    -1.44
     Romanian
    -1.44
    UGH
    -1.43
    ammy
    -1.42
    uggle
    -1.40
    vous
    -1.38
    igne
    -1.38
    POSITIVE LOGITS
    Spoiler
    1.56
    mand
    1.55
    BIL
    1.54
    Enough
    1.54
    Vill
    1.52
    BU
    1.48
    grow
    1.47
    ailable
    1.47
    Plot
    1.46
     monarchy
    1.45
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.