INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     ſte
    -0.45
     uſed
    -0.41
     houſe
    -0.40
    spaceBetween
    -0.39
     perſon
    -0.39
     ſtate
    -0.39
    -0.39
     roul
    -0.38
     corde
    -0.38
    Preference
    -0.38
    POSITIVE LOGITS
     its
    1.29
     Its
    1.10
    Its
    1.08
    它的
    1.02
    在其
    0.90
    its
    0.88
    及其
    0.86
    0.84
     Jego
    0.84
    对其
    0.82
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.