INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    oux
    -0.72
    illon
    -0.69
    ylan
    -0.68
     Boll
    -0.66
    psey
    -0.66
    pret
    -0.63
    onis
    -0.62
    ancy
    -0.62
     Wim
    -0.61
     Painter
    -0.61
    POSITIVE LOGITS
     following
    1.03
     exact
    0.95
    follow
    0.75
    sites
    0.73
    LLOW
    0.71
    äºĶ
    0.71
     follow
    0.66
    î
    0.65
    ó
    0.64
    spir
    0.64
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.