INDEX
    Explanations

    Abstract concepts

    New Auto-Interp
    Negative Logits
     Efq
    -1.16
    ✭✭
    -1.04
     ſche
    -0.94
     ſeveral
    -0.93
    сылкі
    -0.90
     myſelf
    -0.90
     Anſ
    -0.90
     itſelf
    -0.88
     raiſ
    -0.88
     chofe
    -0.86
    POSITIVE LOGITS
     her
    0.58
     true
    0.52
     so
    0.50
     main
    0.48
     purpose
    0.48
     to
    0.47
     top
    0.47
     en
    0.47
     off
    0.47
     visit
    0.47
    Act Density 0.057%

    No Known Activations