INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     Ferr
    -0.72
    mbuds
    -0.71
     torches
    -0.69
     Canaver
    -0.68
     Volvo
    -0.67
    ARM
    -0.64
    gro
    -0.64
    OVER
    -0.63
    venants
    -0.62
    framework
    -0.61
    POSITIVE LOGITS
    ividual
    0.70
    ute
    0.66
    umble
    0.65
    uble
    0.65
     pts
    0.65
     pair
    0.65
    hower
    0.64
    uben
    0.64
    ients
    0.63
    hots
    0.62
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.