INDEX
    Explanations

    references to social or cultural commentary

    New Auto-Interp
    Negative Logits
     meanwhile
    -0.16
     however
    -0.16
    ,
    -0.16
     mixed
    -0.15
    athe
    -0.14
    i
    -0.14
    alth
    -0.14
    isan
    -0.14
    idal
    -0.14
    wert
    -0.14
    POSITIVE LOGITS
    ailability
    0.16
    782
    0.15
    forge
    0.15
    REFIX
    0.14
    AxisAlignment
    0.14
    ụn
    0.14
    Ä©
    0.14
    Forge
    0.13
     versa
    0.13
    llib
    0.13
    Act Density 0.135%

    No Known Activations