INDEX
Explanations
words related to established patterns or practices
New Auto-Interp
Negative Logits
ileaks
-0.85
stocks
-0.77
gdala
-0.76
gow
-0.74
iaries
-0.69
obin
-0.68
entin
-0.68
kids
-0.66
ackets
-0.65
izoph
-0.65
POSITIVE LOGITS
istically
0.90
istic
0.86
procedure
0.81
circumcision
0.81
slaughter
0.80
ALLY
0.80
ised
0.78
ized
0.77
performed
0.75
routine
0.74
Activations Density 0.078%