INDEX
    Explanations

    words related to roles or categories in specific contexts, often including names and functions

    New Auto-Interp
    Negative Logits
    åde
    -0.16
    .scala
    -0.16
    alker
    -0.15
    sid
    -0.15
    aits
    -0.15
    ạp
    -0.15
     Gall
    -0.15
     canvas
    -0.15
    apper
    -0.15
    uce
    -0.14
    POSITIVE LOGITS
     acet
    0.17
     mart
    0.16
     Sens
    0.16
     defs
    0.15
    acet
    0.15
     troop
    0.15
     sens
    0.15
    ãĤīãģĹ
    0.14
     zlib
    0.14
    ALE
    0.14
    Act Density 0.029%

    No Known Activations