INDEX
    Explanations

    references to "Our" or possessive pronouns indicating belonging or connection

    New Auto-Interp
    Negative Logits
    ãĥ³ãĥĹ
    -0.15
    vell
    -0.15
    399
    -0.15
    ãĥ¥ãĥ¼
    -0.15
    elly
    -0.15
    antes
    -0.15
    annt
    -0.14
    ampler
    -0.14
    /device
    -0.14
    antas
    -0.14
    POSITIVE LOGITS
     Light
    0.18
    Lite
    0.17
     light
    0.17
    _light
    0.17
    át
    0.15
    Light
    0.15
     lights
    0.15
    éħ
    0.15
    LIGHT
    0.15
    åIJ
    0.15
    Act Density 0.047%

    No Known Activations