INDEX
Explanations
action verbs related to activities or accomplishments
patterns of responsibility and accountability in actions
New Auto-Interp
Negative Logits
ogether
-0.68
selves
-0.64
emale
-0.63
common
-0.62
discrep
-0.60
respectively
-0.60
aminer
-0.57
joice
-0.57
earch
-0.56
Composite
-0.56
POSITIVE LOGITS
himself
0.95
Himself
0.66
remorse
0.61
resign
0.59
personally
0.57
solo
0.55
his
0.54
apolog
0.54
zbollah
0.53
beard
0.52
Activations Density 0.753%