INDEX
    Explanations

    asking questions to elicit information

    New Auto-Interp
    Negative Logits
     obeys
    0.76
     изменение
    0.75
     geschrieben
    0.73
    czeniu
    0.73
     zerstört
    0.72
    を変更
    0.71
    写的
    0.71
     Nachricht
    0.71
    និយាយ
    0.70
     написано
    0.70
    POSITIVE LOGITS
     elicit
    1.36
     solicit
    1.33
     probing
    1.32
     probe
    1.31
     probes
    1.27
     elic
    1.26
     soliciting
    1.26
     sparking
    1.24
     prompting
    1.24
     gauge
    1.20
    Act Density 0.596%

    No Known Activations