INDEX
Explanations
last ice, next thought
New Auto-Interp
Negative Logits
י
0.51
ের
0.50
ה
0.50
)
0.49
ל
0.49
ancak
0.48
ی
0.47
:
0.47
↵↵
0.44
ין
0.43
POSITIVE LOGITS
(
0.53
of
0.50
at
0.44
k
0.43
raded
0.42
ty
0.41
nes
0.40
run
0.40
ministerium
0.38
this
0.38
Activations Density 10.030%