INDEX
Explanations
New Auto-Interp
Negative Logits
—
-1.50
—
-1.39
"—
-1.24
)—
-1.23
”—
-1.16
——
-1.09
—"
-1.08
——
-1.05
—(
-0.98
—,
-0.98
POSITIVE LOGITS
ThroughAttribute
0.63
Kit
0.47
para
0.47
kit
0.47
سب
0.45
´
0.45
ɵɵ
0.45
rast
0.44
thẩm
0.44
SPECIFIC
0.44
Activations Density 1.844%