INDEX
Explanations
references to readers and audience engagement in texts
New Auto-Interp
Negative Logits
644
-0.17
obliged
-0.16
oyo
-0.16
Surveillance
-0.15
pter
-0.15
ahoma
-0.15
pts
-0.15
akis
-0.15
apiro
-0.15
erties
-0.14
POSITIVE LOGITS
hip
0.19
riel
0.16
UBY
0.16
.accel
0.15
hood
0.15
rug
0.15
dil
0.15
care
0.14
atik
0.13
ti
0.13
Activations Density 0.027%