INDEX
Explanations
phrases indicating levels of significance or urgency
New Auto-Interp
Negative Logits
Backbone
-0.15
تÙĩا
-0.14
allo
-0.14
รà¸ĵ
-0.14
pective
-0.14
Targets
-0.14
Criteria
-0.13
ames
-0.13
.ng
-0.13
Bust
-0.13
POSITIVE LOGITS
signal
0.23
wakeup
0.22
victory
0.21
win
0.19
sign
0.19
wake
0.19
coup
0.19
slap
0.19
ticking
0.18
opportunity
0.18
Activations Density 0.190%