INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     college
    -0.07
    Language
    -0.07
    stry
    -0.07
     Van
    -0.07
    _CANNOT
    -0.06
     Anime
    -0.06
    Loop
    -0.06
     bedrooms
    -0.06
     triangles
    -0.06
     Sub
    -0.06
    POSITIVE LOGITS
     GPI
    0.06
     ISS
    0.06
     Ό
    0.06
    -sm
    0.06
     inconsistent
    0.06
     heterogeneous
    0.06
    $order
    0.06
     mystical
    0.06
     CLOSE
    0.06
    0.06
    Act Density 0.017%

    No Known Activations