INDEX
    Explanations

    references to individuals and their actions in various contexts

    New Auto-Interp
    Negative Logits
     fun
    -0.51
     y
    -0.46
     is
    -0.46
     I
    -0.45
     Y
    -0.44
     Fun
    -0.42
     (
    -0.42
    buie
    -0.42
     +
    -0.42
    ابر
    -0.40
    POSITIVE LOGITS
    ftagPool
    0.87
     $_"
    0.83
     ſhe
    0.80
     Efq
    0.78
     itſelf
    0.77
    ScopeManager
    0.76
     doubtnut
    0.75
     Majefty
    0.74
    saraba
    0.74
     becauſe
    0.73
    Act Density 0.345%

    No Known Activations