INDEX
    Explanations

    phrases that indicate reasoning or justification

    New Auto-Interp
    Negative Logits
     myſelf
    -1.12
     Theſe
    -1.11
    ArrowToggle
    -1.09
     itſelf
    -1.03
     theſe
    -0.97
     himſelf
    -0.94
     Roskov
    -0.93
    RegressionTest
    -0.92
     Wikimedijinoj
    -0.89
     contextLoads
    -0.87
    POSITIVE LOGITS
     perché
    0.85
     because
    0.81
     Perché
    0.81
     Porque
    0.80
     perchè
    0.77
     Because
    0.76
     porque
    0.76
    Because
    0.74
    because
    0.72
     sababu
    0.72
    Act Density 0.143%

    No Known Activations