INDEX
    Explanations

    human-like states compared to ai

    New Auto-Interp
    Negative Logits
     поба
    -0.10
    indr
    -0.09
    å±Ģ
    -0.09
     Arb
    -0.09
    avery
    -0.09
    aram
    -0.08
    -Al
    -0.08
    omi
    -0.08
    ivery
    -0.08
    .EventHandler
    -0.08
    POSITIVE LOGITS
     like
    0.35
     same
    0.27
     zoals
    0.23
     như
    0.23
    åĥı
    0.23
     way
    0.22
    same
    0.21
     como
    0.20
     seperti
    0.20
     gibi
    0.19
    Act Density 0.101%

    No Known Activations