INDEX
    Explanations

    strong assertions or confirmations supported by evidence

    New Auto-Interp
    Negative Logits
    Ĥ
    -0.17
    ibe
    -0.16
    gi
    -0.15
    hus
    -0.15
     scratches
    -0.14
    imits
    -0.14
    à¸Ńà¹Ģม
    -0.14
    927
    -0.14
    834
    -0.14
    IMITER
    -0.14
    POSITIVE LOGITS
    ekim
    0.17
     evidence
    0.17
    ctic
    0.16
    buah
    0.15
    aken
    0.15
    ktor
    0.15
    amedi
    0.14
    ikt
    0.14
    ssel
    0.14
     Cres
    0.14
    Act Density 0.333%

    No Known Activations