INDEX
Explanations
foundation for learning or career
New Auto-Interp
Negative Logits
\
0.53
with
0.49
enn
0.47
Министерство
0.45
0
0.45
د
0.45
eding
0.43
,
0.43
act
0.43
ent
0.42
POSITIVE LOGITS
ש
0.62
basis
0.57
yere
0.57
も
0.56
↵
0.55
י
0.54
の実
0.52
x
0.52
פ
0.52
כ
0.52
Activations Density 0.006%