INDEX
Explanations
phrases indicating prior statements or references made in the text
New Auto-Interp
Negative Logits
ey
-0.17
hausen
-0.15
travers
-0.15
vag
-0.14
Ethnic
-0.13
ContentPane
-0.13
etter
-0.13
oci
-0.13
Greater
-0.13
rib
-0.13
POSITIVE LOGITS
foy
0.16
.Factory
0.15
-Men
0.15
[assembly
0.14
mad
0.14
nge
0.14
adık
0.14
kul
0.14
øre
0.14
TOT
0.14
Activations Density 0.048%