INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    halla
    -0.80
     glim
    -0.79
     pse
    -0.75
     volunt
    -0.74
     advoc
    -0.74
     enthusi
    -0.74
     blat
    -0.71
     unlaw
    -0.69
     manslaughter
    -0.68
    jri
    -0.68
    POSITIVE LOGITS
     favorite
    0.90
     favorites
    0.86
     Favorite
    0.78
    favorite
    0.74
    fw
    0.69
    MAC
    0.69
    Cat
    0.65
    ãĥį
    0.63
    é»Ĵ
    0.62
    [_
    0.62
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.