INDEX
Explanations
quite followed by an adjective
New Auto-Interp
Negative Logits
by
1.16
یم
1.13
3
1.08
های
1.00
0
1.00
ि
0.96
یش
0.96
</h3>
0.95
ג
0.95
้
0.93
POSITIVE LOGITS
in
1.71
on
1.52
c
1.43
p
1.42
ak
1.38
t
1.38
al
1.24
৭
1.23
ut
1.18
ار
1.16
Activations Density 0.006%