INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    <bos>
    -0.52
     ſtate
    -0.47
     houſe
    -0.39
    LogUtils
    -0.38
     avenir
    -0.37
    Retry
    -0.37
    outheast
    -0.37
     agujas
    -0.37
     ſche
    -0.36
    -0.36
    POSITIVE LOGITS
     was
    1.10
    was
    1.06
     Was
    0.95
     were
    0.91
    Was
    0.90
    were
    0.86
     WAS
    0.84
    Twas
    0.79
     WERE
    0.79
     была
    0.78
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.