INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.07
    2:0.09
    3:0.08
    4:0.09
    5:0.08
    6:0.08
    7:0.07
    8:0.08
    9:0.07
    10:0.08
    11:0.08
    Negative Logits
     Translation
    -3.63
     Ng
    -2.81
    translation
    -2.77
     Verse
    -2.76
    tera
    -2.63
    ño
    -2.57
     Literature
    -2.57
    quez
    -2.53
     Nun
    -2.53
     Buk
    -2.50
    POSITIVE LOGITS
     Costco
    2.77
     dining
    2.58
     behavi
    2.52
     socially
    2.48
     athlet
    2.46
     departing
    2.46
     corros
    2.43
     college
    2.41
    ackle
    2.33
     sever
    2.31
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.