INDEX
Explanations
names of entities or concepts
New Auto-Interp
Negative Logits
te
0.70
f
0.68
ed
0.67
ite
0.59
6
0.59
ir
0.58
सँग
0.58
ad
0.57
5
0.57
and
0.56
POSITIVE LOGITS
to
0.75
that
0.55
нтов
0.54
пациен
0.52
bebe
0.48
it
0.48
관리
0.47
ого
0.47
ਕਿ
0.45
že
0.45
Activations Density 0.319%