INDEX
Explanations
phrases related to denial or refutation
New Auto-Interp
Negative Logits
tags
-0.76
emetery
-0.75
tnc
-0.74
incinn
-0.72
tools
-0.72
inki
-0.71
uyomi
-0.69
cffffcc
-0.69
pes
-0.69
Bonus
-0.69
POSITIVE LOGITS
wrongdoing
1.00
allegations
0.93
accusations
0.87
vehemently
0.86
contradict
0.83
allegation
0.83
involvement
0.79
innocence
0.78
claims
0.78
denying
0.77
Activations Density 0.119%