INDEX
Explanations
phrases indicating the allocation or prioritization of importance or value
New Auto-Interp
Negative Logits
zeug
-0.16
opup
-0.15
trag
-0.15
/from
-0.15
Avec
-0.15
ecz
-0.15
ape
-0.14
illy
-0.14
atsu
-0.14
apply
-0.14
POSITIVE LOGITS
emphasis
0.30
bets
0.29
importance
0.23
blame
0.23
emphasis
0.23
Importance
0.21
emphasize
0.19
demands
0.19
placed
0.18
Limits
0.18
Activations Density 0.044%