INDEX
    Explanations

    formatting related to programming code, potentially related to strings and declarations

    New Auto-Interp
    Negative Logits
     exerc
    -0.76
    andowski
    -0.73
     Nau
    -0.68
    ITED
    -0.68
    ipples
    -0.66
     Mellon
    -0.64
     Rav
    -0.63
     disse
    -0.63
    æ©
    -0.62
    ãĥīãĥ©
    -0.62
    POSITIVE LOGITS
    %%%%
    1.08
    AppData
    0.95
    reet
    0.94
    imate
    0.83
    username
    0.80
    lu
    0.79
    typ
    0.79
    imates
    0.79
    chance
    0.78
    %%
    0.78
    Act Density 0.032%

    No Known Activations