INDEX
Explanations
references to judgment and control within a societal or hierarchical context
New Auto-Interp
Negative Logits
fubject
-0.64
pleaſure
-0.60
myſelf
-0.59
HtmlAttribute
-0.59
Monfieur
-0.59
wnież
-0.56
ſta
-0.54
للاسماء
-0.54
周泽
-0.54
TextWatcher
-0.54
POSITIVE LOGITS
shameless
0.48
dog
0.43
slapped
0.41
pit
0.40
beaten
0.40
shit
0.39
outrageous
0.38
miserably
0.38
Dog
0.38
shits
0.37
Activations Density 0.082%