INDEX
Explanations
repeated phrases or structures, particularly involving the word "to."
New Auto-Interp
Negative Logits
up
-0.18
ro
-0.17
irc
-0.16
wood
-0.15
anc
-0.15
au
-0.14
vol
-0.14
pii
-0.14
ago
-0.14
rb
-0.14
POSITIVE LOGITS
ehr
0.19
asters
0.19
ools
0.18
plevel
0.17
othy
0.17
pek
0.16
aster
0.16
lags
0.16
ASTER
0.16
eh
0.15
Activations Density 0.166%