INDEX
Explanations
words indicating actions done with intent or purpose
terms related to intentional or deliberate actions
New Auto-Interp
Negative Logits
Rite
-0.85
busters
-0.69
HERO
-0.67
ANGEL
-0.67
Warriors
-0.66
Tycoon
-0.65
Rated
-0.63
iry
-0.63
Colleges
-0.62
Emir
-0.62
POSITIVE LOGITS
deliberately
1.05
intentionally
0.96
planted
0.92
purposely
0.90
purposefully
0.85
reprodu
0.82
plotted
0.82
disreg
0.80
indul
0.77
misrepresent
0.77
Activations Density 0.009%