INDEX
Explanations
words that indicate support or encouragement in various contexts
New Auto-Interp
Negative Logits
dos
-0.73
wording
-0.69
Tsukuyomi
-0.61
uyomi
-0.61
mainland
-0.61
Fifty
-0.60
thresholds
-0.59
halfway
-0.59
Antar
-0.59
HRC
-0.58
POSITIVE LOGITS
ered
1.23
rative
1.20
erest
1.17
ering
1.11
erer
1.05
erers
1.05
cest
1.05
racted
1.04
iced
1.01
urers
1.00
Activations Density 0.007%