INDEX
Explanations
phrases indicating a shift in tone or perspective
New Auto-Interp
Negative Logits
arin
-0.15
takže
-0.14
váºŃy
-0.14
).
-0.14
ãģĭãĤı
-0.14
)).
-0.13
terminate
-0.13
ãĥ£
-0.13
ãĢĤä½Ĩ
-0.13
imen
-0.13
POSITIVE LOGITS
and
0.18
whose
0.18
-)
0.18
_)
0.17
_)
0.17
and
0.17
which
0.16
whose
0.16
or
0.16
_Tis
0.16
Activations Density 0.109%