INDEX
    Explanations

    Responsibility Principle

    New Auto-Interp
    Negative Logits
    ሆን
    0.39
     respetar
    0.38
     carving
    0.36
     Maradona
    0.36
     mysteriously
    0.35
    0.35
     carvings
    0.35
     overshadow
    0.34
    phere
    0.34
    ளுடன்
    0.34
    POSITIVE LOGITS
    Question
    0.44
    CO
    0.41
    question
    0.40
    BUF
    0.39
    Determine
    0.39
    Proble
    0.39
    Вы
    0.37
    Rates
    0.37
    0.37
    Knowledge
    0.37
    Act Density 0.000%

    No Known Activations