INDEX
Explanations
phrases that indicate focus or attention towards a subject or topic
New Auto-Interp
Negative Logits
PT
-0.15
hest
-0.15
ause
-0.14
zim
-0.14
ÑĢоÑĤив
-0.14
nal
-0.14
@nate
-0.14
AVE
-0.14
ãĤīãģı
-0.14
colo
-0.13
POSITIVE LOGITS
Tow
0.16
toward
0.15
à¹Ģà¸Ļ
0.15
naÄį
0.15
areas
0.14
how
0.14
627
0.14
uzzi
0.14
lix
0.13
olan
0.13
Activations Density 0.056%