INDEX
    Explanations

    repeated instances of the word "simple" in various contexts

    New Auto-Interp
    Negative Logits
     litt
    -0.15
    ξε
    -0.15
    um
    -0.15
    ONA
    -0.15
    ET
    -0.14
    uste
    -0.14
     inf
    -0.14
    rix
    -0.14
     Sou
    -0.14
     meer
    -0.14
    POSITIVE LOGITS
    °}
    0.15
    oyer
    0.15
    #
    0.14
    gend
    0.14
    vir
    0.14
    celik
    0.14
    @js
    0.14
    catalog
    0.14
    cul
    0.14
    æ³³
    0.14
    Act Density 0.009%

    No Known Activations