INDEX
    Explanations

    phrases involving actions related to individuals and their interactions

    New Auto-Interp
    Negative Logits
     itſelf
    -0.88
     Efq
    -0.80
     (\<
    -0.73
     iſt
    -0.72
     Reſ
    -0.71
     Diſ
    -0.71
     stiefel
    -0.70
     Majefty
    -0.70
     Beſ
    -0.69
     reaſon
    -0.69
    POSITIVE LOGITS
     را
    0.74
    ceğini
    0.73
    devamını
    0.70
    ığını
    0.70
    larını
    0.69
    音を
    0.69
    MENAFN
    0.68
    lerini
    0.67
    meyi
    0.67
    ätä
    0.66
    Act Density 0.052%

    No Known Activations