INDEX
    Explanations

    references to questions, particularly their phrasing and structure

    New Auto-Interp
    Negative Logits
    Personendaten
    -0.84
    lepiej
    -0.79
     mijne
    -0.77
     paroisse
    -0.77
     edn
    -0.77
    godic
    -0.76
    felves
    -0.76
    ]-'
    -0.75
     gloire
    -0.74
    بوابة
    -0.74
    POSITIVE LOGITS
     question
    2.23
     questions
    2.22
     Question
    2.12
    question
    2.00
     Questions
    1.91
    Question
    1.90
     QUESTION
    1.86
    questions
    1.83
    Questions
    1.83
    QUESTION
    1.66
    Act Density 0.054%

    No Known Activations