INDEX
Explanations
phrases indicating upward movement or escalation
New Auto-Interp
Negative Logits
leich
-0.15
éry
-0.15
t
-0.15
.cleanup
-0.15
Vectorizer
-0.14
دÙĪ
-0.14
away
-0.14
elen
-0.14
νομ
-0.14
ecal
-0.14
POSITIVE LOGITS
bben
0.15
upa
0.15
bris
0.15
beden
0.14
æŁ±
0.14
pillar
0.14
rè
0.14
!=(
0.14
ä¹ĺ
0.14
slack
0.13
Activations Density 0.044%