INDEX
    Explanations

    mentions of linear concepts or terms, particularly in mathematical or technical contexts

    New Auto-Interp
    Negative Logits
    amax
    -0.21
     nonlinear
    -0.20
    ennis
    -0.18
    anela
    -0.17
    ing
    -0.17
    ENTE
    -0.16
    ÐIJÑĢÑħÑĸв
    -0.16
    yre
    -0.15
    ingroup
    -0.15
    ente
    -0.15
    POSITIVE LOGITS
    ly
    0.41
    ized
    0.35
    ization
    0.32
    izing
    0.28
    izable
    0.28
    ize
    0.27
    ities
    0.27
    ised
    0.26
    isation
    0.24
    izes
    0.23
    Act Density 0.014%

    No Known Activations