INDEX
Explanations
closing tags and punctuation
New Auto-Interp
Negative Logits
ار
0.62
m
0.54
ن
0.54
on
0.52
s
0.51
ন
0.46
ર
0.45
ri
0.45
of
0.44
as
0.44
POSITIVE LOGITS
?
0.67
ה
0.47
ه
0.46
!
0.45
revital
0.44
</strong>
0.42
া
0.42
끔
0.39
</h3>
0.38
ig
0.38
Activations Density 0.396%