INDEX
    Explanations

    important / what I can do

    New Auto-Interp
    Negative Logits
    ДК
    0.39
    ugd
    0.39
    ลอง
    0.38
     Idha
    0.37
    0.37
     Bonne
    0.37
    গার
    0.36
     λοι
    0.36
     मुझ
    0.35
    试试
    0.35
    POSITIVE LOGITS
     important
    0.60
    Instead
    0.60
     महत्वपूर्ण
    0.58
     önemli
    0.58
    Important
    0.56
     важно
    0.55
     crucial
    0.54
    重要な
    0.53
     wichtigen
    0.52
    important
    0.51
    Act Density 0.001%

    No Known Activations