INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    bum
    -0.65
     Holo
    -0.61
     maiden
    -0.58
    riel
    -0.57
    ocial
    -0.57
     outdoors
    -0.57
    )</
    -0.56
    [_
    -0.56
    iew
    -0.54
     timid
    -0.54
    POSITIVE LOGITS
    ioned
    0.87
    pees
    0.75
    isson
    0.72
     Meaning
    0.71
    llah
    0.69
    zers
    0.68
    alties
    0.66
    andem
    0.66
    strom
    0.66
    rons
    0.65
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.