INDEX
    Explanations

    words related to specific names or titles, potentially in a playful or informal context

    New Auto-Interp
    Negative Logits
    INU
    -0.21
    IFI
    -0.19
    SCRI
    -0.19
    AGMA
    -0.17
    IOD
    -0.17
    ICLE
    -0.17
    PLIC
    -0.17
    ILI
    -0.17
    IMIT
    -0.16
    ICC
    -0.16
    POSITIVE LOGITS
    hi
    0.40
    bi
    0.39
    vi
    0.34
    di
    0.33
    pi
    0.33
    li
    0.32
    ui
    0.32
    ni
    0.30
    ki
    0.29
    ii
    0.29
    Act Density 0.032%

    No Known Activations