INDEX
    Explanations

    references to historical figures and cultural context

    New Auto-Interp
    Negative Logits
     ancak
    -0.14
    PasswordEncoder
    -0.13
    ấn
    -0.13
    .As
    -0.13
    allenge
    -0.13
    ijken
    -0.12
    paged
    -0.12
    drž
    -0.12
    ueil
    -0.12
    ATUS
    -0.12
    POSITIVE LOGITS
     like
    0.81
     como
    0.75
     comme
    0.66
     sebagai
    0.63
     jako
    0.61
     Like
    0.60
    como
    0.60
     như
    0.58
    Like
    0.55
    like
    0.52
    Act Density 0.083%

    No Known Activations