INDEX
Explanations
statements regarding personal accountability and the implications of individual actions in society
New Auto-Interp
Negative Logits
hammer
-0.17
adox
-0.15
.shell
-0.15
-------------------------------------------------------------------------
-0.15
uil
-0.15
orney
-0.14
pend
-0.14
uard
-0.14
orian
-0.14
-bind
-0.14
POSITIVE LOGITS
engaged
0.28
commit
0.27
engage
0.26
commit
0.25
engaging
0.25
cause
0.24
Commit
0.24
resort
0.24
engages
0.24
Eng
0.23
Activations Density 0.401%