INDEX
Explanations
phrases indicating complexity or intensity in experiences or narratives
New Auto-Interp
Negative Logits
imos
-0.17
ekim
-0.16
pread
-0.16
ãĢij
-0.15
inders
-0.15
Dao
-0.15
ÎķÎł
-0.14
ÏĦικ
-0.14
à¹ĭ
-0.14
Bair
-0.14
POSITIVE LOGITS
hw
0.19
UTTON
0.15
leta
0.15
sorte
0.14
sort
0.14
-NLS
0.14
lope
0.14
ys
0.14
ioni
0.14
imson
0.13
Activations Density 0.018%