INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    bby
    -0.57
    dious
    -0.56
     very
    -0.53
     neat
    -0.50
    ↵↵
    -0.49
     the
    -0.46
     vag
    -0.42
     two
    -0.42
     more
    -0.41
    ницу
    -0.41
    POSITIVE LOGITS
    Rhestr
    0.89
     photolibrary
    0.85
     raiſ
    0.85
     Houſe
    0.78
     Meksiku
    0.75
    ftagPool
    0.75
    noons
    0.75
     Anſ
    0.74
     himſelf
    0.73
    ItemLayout
    0.72
    Act Density 0.262%

    No Known Activations