INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     
    0.84
    1
    0.77
    2
    0.71
    0.68
     a
    0.65
     shirt
    0.64
     и
    0.62
     και
    0.61
    0.61
     और
    0.60
    POSITIVE LOGITS
     તેની
    1.08
     వాటి
    1.05
    它们的
    1.01
     njihov
    0.98
     તેને
    0.97
    それが
    0.96
     त्याचे
    0.96
    他們的
    0.95
     Its
    0.95
     suas
    0.94
    Act Density 0.004%

    No Known Activations