INDEX
    Explanations

    explaining values, concepts, or contexts

    New Auto-Interp
    Negative Logits
     Kom
    0.42
     compensate
    0.42
     Aleg
    0.40
     kom
    0.40
     Kamera
    0.38
     publisher
    0.38
     Kig
    0.38
     Train
    0.37
     bp
    0.37
     compens
    0.36
    POSITIVE LOGITS
    arr
    0.43
    hydrogen
    0.43
    فين
    0.42
     आयरन
    0.42
    0.41
    quiao
    0.40
    0.38
    pital
    0.38
     हाइड्रोजन
    0.37
    вається
    0.36
    Act Density 0.001%

    No Known Activations