INDEX
Explanations
punctuation marks and delimiters in the text
New Auto-Interp
Negative Logits
Cubans
-0.58
armani
-0.58
不就是
-0.56
Umberto
-0.56
Cæsar
-0.55
Pompey
-0.55
Goliath
-0.55
Bellini
-0.54
gne
-0.54
Bix
-0.53
POSITIVE LOGITS
"]);
1.47
']))
1.39
"])
1.39
)";
1.38
"];
1.36
"]));
1.33
"]
1.32
"],
1.32
"]];
1.31
")));
1.28
Activations Density 0.224%