INDEX
    Explanations

    agents and their actions

    New Auto-Interp
    Negative Logits
    icycles
    0.85
    endung
    0.77
    ট্রোল
    0.76
     nephews
    0.75
     मंत्रियों
    0.75
    ત્વ
    0.74
     Coaching
    0.74
     grandsons
    0.74
     granddaughters
    0.74
     pesce
    0.73
    POSITIVE LOGITS
     builder
    2.40
     scorer
    2.27
     maker
    2.25
     writer
    2.22
     creator
    2.22
     evaluator
    2.21
     renderer
    2.20
     translator
    2.19
     performer
    2.19
     tester
    2.16
    Act Density 1.929%

    No Known Activations