INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    idency
    -0.93
    ushima
    -0.77
    andem
    -0.76
    idential
    -0.75
    FINE
    -0.74
    andro
    -0.72
    arios
    -0.71
     Ces
    -0.71
     unregulated
    -0.69
    ilde
    -0.69
    POSITIVE LOGITS
    bird
    1.43
    birds
    1.21
     bird
    1.12
     Bird
    1.06
    Bird
    0.99
    hawk
    0.96
     owl
    0.93
    bats
    0.92
     Birds
    0.90
     birds
    0.89
    Act Density 0.007%

    No Known Activations