INDEX
    Explanations

    phrases that depict descriptions or characterizations of individuals and concepts

    New Auto-Interp
    Negative Logits
    hoff
    -0.15
    uju
    -0.14
    506
    -0.14
    vak
    -0.14
    aser
    -0.13
    loom
    -0.13
    Advice
    -0.13
    303
    -0.13
    ating
    -0.13
     sö
    -0.13
    POSITIVE LOGITS
     differently
    0.28
     as
    0.24
     sebagai
    0.21
     как
    0.17
     بأÙĨ
    0.17
    æĽ°
    0.16
     jako
    0.16
    clusters
    0.16
     ÏīÏĤ
    0.16
     Ñıк
    0.15
    Act Density 0.104%

    No Known Activations