INDEX
Explanations
proper nouns and technical terms
New Auto-Interp
Negative Logits
s
1.09
it
0.95
(
0.79
a
0.75
igating
0.75
:
0.72
idiots
0.71
س
0.70
gasped
0.70
finisher
0.68
POSITIVE LOGITS
ן
1.04
ской
0.83
ת
0.82
ν
0.81
ní
0.80
ນ
0.79
ﻟ
0.79
ında
0.77
ised
0.75
ও
0.73
Activations Density 0.030%