INDEX
Explanations
emphasized phrases or clauses
New Auto-Interp
Negative Logits
no
0.24
and
0.23
keep
0.22
get
0.22
it
0.22
don
0.21
we
0.21
please
0.21
includes
0.21
including
0.21
POSITIVE LOGITS
چنان
0.23
যখন
0.21
ആരോപ
0.19
\{0.19
virulent
0.19
теркә
0.18
bergantung
0.18
versive
0.18
vốn
0.18
பிரச்சின
0.18
Activations Density 0.240%