INDEX
Explanations
instances of the word "groping" and similar variations relating to unwanted physical contact
New Auto-Interp
Negative Logits
tune
-0.88
Tune
-0.70
REE
-0.69
senal
-0.69
SO
-0.68
Effective
-0.68
VID
-0.68
VIS
-0.67
DIT
-0.66
ministry
-0.65
POSITIVE LOGITS
grop
1.12
ingly
0.89
ing
0.89
atted
0.89
ured
0.87
estation
0.86
ographs
0.85
eties
0.85
raped
0.85
ating
0.84
Activations Density 0.006%