INDEX
    Explanations

    numerical values and their equivalents in various contexts

    New Auto-Interp
    Negative Logits
    adele
    -0.15
     neutral
    -0.15
    stvo
    -0.14
    ruc
    -0.14
    anco
    -0.14
    captures
    -0.14
    ansen
    -0.14
    ĥĿ
    -0.14
    agger
    -0.14
    lette
    -0.14
    POSITIVE LOGITS
    uku
    0.15
    FRING
    0.15
    -equ
    0.15
    rement
    0.14
    书记
    0.14
    erd
    0.14
    âk
    0.14
    reas
    0.14
    дам
    0.13
    CCR
    0.13
    Act Density 0.157%

    No Known Activations