INDEX
Explanations
references to immediate actions or events, particularly those indicating urgency or direct consequence
New Auto-Interp
Negative Logits
ango
-0.15
Mold
-0.15
dl
-0.15
encing
-0.15
ANGO
-0.15
cela
-0.14
pector
-0.14
exo
-0.14
arguments
-0.14
ιακ
-0.14
POSITIVE LOGITS
485
0.15
adora
0.15
zy
0.15
embr
0.15
yan
0.15
869
0.14
Kir
0.14
atri
0.14
untu
0.14
516
0.14
Activations Density 0.013%