INDEX
Explanations
specific nouns followed by punctuation
New Auto-Interp
Negative Logits
without
0.59
belirli
0.53
без
0.52
granular
0.51
dengan
0.49
specific
0.47
Без
0.46
WITH
0.45
WITHOUT
0.45
bestimmten
0.45
POSITIVE LOGITS
والذي
0.77
famed
0.75
nonché
0.72
والتي
0.71
nicknamed
0.70
,—
0.66
commemorated
0.65
convened
0.65
occasioned
0.64
(!
0.64
Activations Density 0.715%