INDEX
    Explanations

    specific concepts or entities

    New Auto-Interp
    Negative Logits
     banyak
    0.53
     Menschen
    0.48
    0.43
    ногие
    0.43
     ciertas
    0.43
     insanın
    0.42
     زیادی
    0.41
     पता
    0.41
     касается
    0.41
     incroy
    0.40
    POSITIVE LOGITS
     which
    0.70
    .";
    0.70
     ซึ่ง
    0.69
    .\\
    0.69
    .");
    0.68
    0.68
    which
    0.66
    .');
    0.65
     ();
    0.64
    .",
    0.64
    Act Density 4.857%

    No Known Activations