INDEX
Explanations
percentage values
percentage values
New Auto-Interp
Negative Logits
shaped
-0.73
WithNo
-0.71
gotten
-0.70
jected
-0.67
stru
-0.67
flanked
-0.65
iety
-0.64
gad
-0.64
cru
-0.64
psychiat
-0.64
POSITIVE LOGITS
%-
0.83
FREE
0.77
-+
0.75
payer
0.74
lein
0.72
!/
0.71
fps
0.70
%
0.70
+)
0.70
ABV
0.69
Activations Density 0.057%