INDEX
Explanations
words that indicate significant events or conditions related to change and impact
New Auto-Interp
Negative Logits
ï
-0.16
various
-0.14
á»ijn
-0.14
/she
-0.13
pic
-0.13
/from
-0.12
...↵↵↵↵
-0.12
sembles
-0.12
Æ°á»Łng
-0.12
ses
-0.12
POSITIVE LOGITS
ly
0.30
-looking
0.28
lest
0.28
LY
0.22
ترÛĮÙĨ
0.22
mente
0.21
جدا
0.20
ily
0.20
aneously
0.20
ä¸Ķ
0.19
Activations Density 1.585%