INDEX
Explanations
statements of argumentation or claims made in a narrative or discourse
New Auto-Interp
Negative Logits
vel
-0.16
dek
-0.14
agens
-0.14
egas
-0.14
eric
-0.14
von
-0.13
Ed
-0.13
ho
-0.13
ugu
-0.13
arga
-0.13
POSITIVE LOGITS
_epi
0.16
omik
0.15
816
0.15
uild
0.15
ContentLoaded
0.14
ýt
0.14
rame
0.14
šak
0.14
%^
0.14
nez
0.14
Activations Density 0.195%