INDEX
Explanations
phrases that indicate frequency or repetition of events
New Auto-Interp
Negative Logits
Ì£
-0.16
.scalablytyped
-0.16
paged
-0.15
رÙĬÙĥÙĬØ©
-0.14
shed
-0.14
áº
-0.14
iou
-0.14
ease
-0.14
次
-0.14
ìĿ´ìĹIJ
-0.14
POSITIVE LOGITS
blue
0.29
awhile
0.27
Blue
0.25
blue
0.25
-blue
0.23
BLUE
0.23
Blue
0.23
BLUE
0.22
aw
0.21
.blue
0.21
Activations Density 0.014%