INDEX
Explanations
denials and confirmations of involvement or wrongdoing in various situations
New Auto-Interp
Negative Logits
tnc
-0.84
phabet
-0.73
tags
-0.71
Bonus
-0.70
tn
-0.69
inki
-0.69
wait
-0.67
incinn
-0.65
acent
-0.64
ository
-0.63
POSITIVE LOGITS
allegations
1.13
wrongdoing
1.12
accusations
1.12
charges
0.98
allegation
0.96
involvement
0.94
accusation
0.93
denying
0.86
claims
0.80
interfering
0.78
Activations Density 0.169%