INDEX
    Explanations

    causal relationships or explanations in text

    New Auto-Interp
    Negative Logits
     ModelExpression
    -0.59
     мәкал
    -0.58
     مرئيه
    -0.56
    WireFormatLite
    -0.56
    səhifə
    -0.54
    NameInMap
    -0.53
     faſt
    -0.53
    Дерекк
    -0.52
    ロウィン
    -0.51
    -0.51
    POSITIVE LOGITS
     because
    0.82
     reasons
    0.78
     porque
    0.70
     reason
    0.69
     perché
    0.66
    是因为
    0.66
    because
    0.65
     sababu
    0.65
     BECAUSE
    0.63
    Porque
    0.62
    Act Density 0.438%

    No Known Activations