INDEX
    Explanations

    abstract concepts and states

    New Auto-Interp
    Negative Logits
     nutritious
    0.46
     kvinn
    0.46
    0.45
    しっかり
    0.44
     nový
    0.43
    '],
    0.42
    0.42
    活躍
    0.42
    ونها
    0.42
    jší
    0.40
    POSITIVE LOGITS
     যদি
    0.46
    να
    0.42
    굉장
    0.42
    0.41
    Faucet
    0.41
    简直
    0.41
     филосо
    0.41
    0.40
     lmao
    0.40
     דבר
    0.40
    Act Density 0.046%

    No Known Activations