INDEX
    Explanations

    hazards, dangers, and problems

    New Auto-Interp
    Negative Logits
    aust
    0.45
    elan
    0.42
     افرادی
    0.41
    util
    0.41
    大幅
    0.40
     verwenden
    0.40
    abe
    0.39
    ESS
    0.39
    ignan
    0.39
    0.39
    POSITIVE LOGITS
     hazards
    1.29
     dangers
    1.23
     threats
    1.23
     perils
    1.17
     Threats
    1.11
     Hazards
    1.08
     problems
    1.07
     проблемы
    0.99
    problems
    0.97
     evils
    0.96
    Act Density 0.021%

    No Known Activations