INDEX
    Explanations

    specific strings like 'ON' or 'in' within a text

    specific prepositions and their variations

    New Auto-Interp
    Negative Logits
     Revelations
    -0.74
     Quadro
    -0.68
    ATTLE
    -0.66
     laughter
    -0.64
    ij士
    -0.63
    enance
    -0.63
     Penguins
    -0.62
     ---------
    -0.59
     exits
    -0.58
     Buildings
    -0.58
    POSITIVE LOGITS
    jin
    1.04
    ichi
    0.98
    ghai
    0.92
    kered
    0.92
    hiro
    0.91
    ji
    0.91
    ju
    0.88
    hao
    0.88
    ori
    0.88
    nan
    0.87
    Act Density 0.095%

    No Known Activations