INDEX
    Explanations

    phrases or terms within quotations

    quoted phrases or expressions in the text

    New Auto-Interp
    Negative Logits
     "[
    -0.84
     afar
    -0.83
     whilst
    -0.79
     preceded
    -0.78
     Ubisoft
    -0.76
     accomp
    -0.75
     [â̦]
    -0.75
     while
    -0.75
     cited
    -0.75
     viewed
    -0.75
    POSITIVE LOGITS
    clean
    1.46
    moral
    1.43
    death
    1.43
    safe
    1.40
    reset
    1.39
    smart
    1.39
    zero
    1.38
    Make
    1.37
    safety
    1.37
    comfort
    1.36
    Act Density 0.093%

    No Known Activations