INDEX
    Explanations

    questions starting with are or does

    New Auto-Interp
    Negative Logits
    稱為
    0.70
     гэ
    0.65
    adikan
    0.61
    Throwable
    0.61
    ствием
    0.61
    нкт
    0.60
     पकड़ा
    0.59
    0.59
    ன்னு
    0.59
     ffilm
    0.58
    POSITIVE LOGITS
     Does
    4.06
     does
    3.84
    Does
    3.79
     Did
    3.72
     did
    3.64
     Are
    3.56
    Did
    3.50
    Are
    3.30
    does
    3.27
    did
    3.01
    Act Density 1.177%

    No Known Activations