INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ل
    0.47
    е
    0.44
     and
    0.43
     եւ
    0.40
     και
    0.39
    ın
    0.38
    และ
    0.38
    سے
    0.38
    ל
    0.38
     ו
    0.37
    POSITIVE LOGITS
     needing
    0.44
    0.38
     being
    0.38
     addicted
    0.38
    in
    0.37
     apologizing
    0.37
    ta
    0.37
     ചെയ്യുന്ന
    0.36
     सुर्खियों
    0.35
     malah
    0.35
    Act Density 0.020%

    No Known Activations