INDEX
    Explanations

    multiple-choice answers followed by period

    New Auto-Interp
    Negative Logits
    p
    0.41
    igten
    0.39
    icht
    0.38
     bepaalde
    0.38
    agia
    0.38
    ёз
    0.38
     drained
    0.37
     trataro
    0.37
    ómo
    0.36
    5
    0.36
    POSITIVE LOGITS
     None
    0.54
     none
    0.53
     lahat
    0.50
     nessuna
    0.50
     всех
    0.49
     żad
    0.49
     ninguna
    0.48
     всички
    0.48
     никаких
    0.48
     swarm
    0.48
    Act Density 0.012%

    No Known Activations