INDEX
    Explanations

    trusted friend, family, or adult

    New Auto-Interp
    Negative Logits
    ంగా
    0.86
     untersucht
    0.77
    లు
    0.75
    ના
    0.73
    ו
    0.72
    شي
    0.71
    يكي
    0.70
    };
    0.68
    ασ
    0.68
    هاي
    0.68
    POSITIVE LOGITS
    z
    1.13
    st
    0.99
     trusted
    0.98
    л
    0.95
    is
    0.89
     Trusted
    0.88
    ic
    0.84
    ologist
    0.84
    ate
    0.80
    eg
    0.80
    Act Density 0.002%

    No Known Activations