INDEX
    Explanations

    Authenticity vs artificiality

    New Auto-Interp
    Negative Logits
     asli
    -0.08
    દર
    -0.07
     भ्रम
    -0.07
     mnemonic
    -0.07
    Fuel
    -0.07
    П
    -0.07
    Resizable
    -0.07
    acebook
    -0.07
    Asia
    -0.07
    Mozilla
    -0.07
    POSITIVE LOGITS
     artificially
    0.14
     künst
    0.13
    /artificial
    0.13
     artificial
    0.12
     synthetic
    0.11
     Artificial
    0.11
     Synthetic
    0.10
    ifice
    0.10
    synt
    0.10
     manufactured
    0.10
    Act Density 0.158%

    No Known Activations