INDEX
    Explanations

    identifying unusual or specific things

    New Auto-Interp
    Negative Logits
     stories
    0.48
     torque
    0.48
     amigo
    0.48
     tic
    0.46
     weight
    0.45
    یط
    0.45
     Ridd
    0.45
    ٘
    0.45
     Fors
    0.44
     praise
    0.44
    POSITIVE LOGITS
     επα
    0.46
    čk
    0.45
     Β
    0.43
     δε
    0.43
     ե
    0.43
     Ανα
    0.42
     ανα
    0.42
    0.42
     τελευτα
    0.41
     अनुमति
    0.41
    Act Density 0.002%

    No Known Activations