INDEX
    Explanations

    dialogue/script

    New Auto-Interp
    Negative Logits
    �ન
    -0.08
     Split
    -0.08
     MUST
    -0.08
     Darkness
    -0.07
     configurable
    -0.07
    -0.07
     onafhankelijk
    -0.07
     Join
    -0.07
    -0.07
     negros
    -0.07
    POSITIVE LOGITS
    unlikely
    0.09
     anymore
    0.09
     😉
    0.09
     😂
    0.09
     dealing
    0.08
     judging
    0.08
     değil
    0.08
     unfair
    0.07
    0.07
     rather
    0.07
    Act Density 0.113%

    No Known Activations