INDEX
Explanations
phrases related to indicators or symptoms of change
New Auto-Interp
Negative Logits
exels
-0.17
acin
-0.16
stood
-0.15
ATED
-0.15
ewan
-0.14
avaÅŁ
-0.14
_SCENE
-0.14
riot
-0.14
-0.13
erli
-0.13
POSITIVE LOGITS
pointing
0.23
posting
0.21
posts
0.19
posted
0.19
pointer
0.19
indicating
0.19
indic
0.19
istr
0.18
alerts
0.17
indicate
0.16
Activations Density 0.022%