INDEX
Explanations
instances of the word "alleged" in a text
references to alleged incidents or claims of wrongdoing
New Auto-Interp
Negative Logits
guiActiveUnfocused
-0.73
Clicker
-0.71
ARCH
-0.70
pb
-0.69
eyes
-0.68
OVA
-0.66
bern
-0.65
ajo
-0.63
beard
-0.63
patch
-0.63
POSITIVE LOGITS
iary
0.87
allegations
0.85
ities
0.85
accusations
0.79
abuser
0.78
accuser
0.77
edly
0.74
misrepresent
0.74
violations
0.72
abuse
0.72
Activations Density 0.025%