INDEX
Explanations
phrases indicating summaries, overviews, or descriptions of content
New Auto-Interp
Negative Logits
á»Ļc
-0.17
.dylib
-0.16
ichni
-0.15
elp
-0.14
eldom
-0.14
ç¢
-0.13
..<
-0.13
gart
-0.13
iggins
-0.13
opleft
-0.13
POSITIVE LOGITS
list
0.19
overview
0.18
glimpse
0.18
list
0.17
listing
0.17
view
0.15
outline
0.15
sar
0.15
description
0.15
mw
0.14
Activations Density 0.106%