INDEX
    Explanations

    philosophical problems and evil

    New Auto-Interp
    Negative Logits
     berpikir
    0.43
     controversies
    0.40
     decisions
    0.39
     trends
    0.39
     Controversy
    0.39
     attitudes
    0.38
    0.38
    谨慎
    0.38
     গভ
    0.37
    0.37
    POSITIVE LOGITS
    Evil
    0.63
     evil
    0.62
     Evil
    0.55
    evil
    0.51
    Explain
    0.49
     why
    0.49
    0.47
    0.47
     erklären
    0.46
    ải
    0.43
    Act Density 0.015%

    No Known Activations