INDEX
Explanations
describing purpose or nature
New Auto-Interp
Negative Logits
lowercase
0.49
another
0.47
ان
0.46
pleasant
0.46
that
0.45
itimate
0.43
ﺍ
0.43
aggravate
0.42
ر
0.42
pathetic
0.42
POSITIVE LOGITS
μέσα
0.45
везде
0.44
);
0.43
)--
0.42
addon
0.40
`;
0.40
unseres
0.40
แผน
0.40
versch
0.40
построен
0.40
Activations Density 0.004%