INDEX
Explanations
phrases or words related to admissions of guilt or ownership
instances of the word "admit" and its variations, indicating admissions of guilt or truthfulness
New Auto-Interp
Negative Logits
ILCS
-0.82
rior
-0.71
osi
-0.70
ighth
-0.68
acements
-0.67
atility
-0.66
ãĤ¡
-0.65
asus
-0.64
rouse
-0.63
adesh
-0.63
POSITIVE LOGITS
wrongdoing
1.07
defeat
1.01
guilt
0.99
responsibility
0.78
ignorance
0.77
admitting
0.77
mistakes
0.77
fault
0.76
culp
0.74
admit
0.74
Activations Density 0.057%