INDEX
Explanations
phrases related to personal actions and opinions
themes related to accountability and personal responsibility
New Auto-Interp
Negative Logits
)",
-0.77
},"
-0.73
")
-0.71
),"
-0.68
')
-0.68
:]
-0.67
],"
-0.63
)"
-0.62
"],"
-0.62
]"
-0.62
POSITIVE LOGITS
anyways
1.31
anyway
1.06
âĢ
0.98
somew
0.93
âĻ
0.92
tho
0.90
.
0.88
anymore
0.87
!.
0.83
ðŁĺ
0.83
Activations Density 0.808%