INDEX
Explanations
sentences starting with "The truth is"
statements or assertions that express a fact or truth
New Auto-Interp
Negative Logits
throp
-0.85
iates
-0.72
Improvement
-0.63
icipated
-0.62
viol
-0.61
stad
-0.61
derog
-0.61
ypes
-0.60
ivil
-0.59
violation
-0.58
POSITIVE LOGITS
borne
0.78
neither
0.77
not
0.74
probably
0.73
unclear
0.72
none
0.71
indeed
0.70
undoubtedly
0.68
still
0.67
doubtless
0.67
Activations Density 0.101%