INDEX
    Explanations

    affirmative responses like yes or yeah

    New Auto-Interp
    Negative Logits
    ~~
    0.44
     trecut
    0.43
    受注
    0.42
    CHT
    0.42
    াস
    0.42
    クリーム
    0.41
     JESUS
    0.41
     WIDTH
    0.40
    LINEAR
    0.40
    ?!"
    0.40
    POSITIVE LOGITS
    ea
    0.46
    o
    0.45
     cynical
    0.44
    itä
    0.43
    hika
    0.43
     controls
    0.42
    0.42
     haus
    0.41
     lizenz
    0.41
    0.41
    Act Density 0.001%

    No Known Activations