INDEX
Explanations
special characters likely specific to the model, potentially used as markers or embeddings for certain concepts or entities
instances of numerical values or quantities
New Auto-Interp
Negative Logits
Mub
-0.81
Robot
-0.80
crocod
-0.77
Haku
-0.76
Ag
-0.76
Sob
-0.72
Tier
-0.72
Solomon
-0.71
Liter
-0.70
Ide
-0.70
POSITIVE LOGITS
Loading
1.27
together
1.27
ó
1.22
matter
1.20
Page
1.18
Commission
1.18
older
1.18
administ
1.17
that
1.17
said
1.17
Activations Density 0.196%