INDEX
Explanations
occurrences of specific unicode characters
negations or phrases indicating disagreement
New Auto-Interp
Negative Logits
Gaul
-0.65
jog
-0.65
sacrific
-0.62
blitz
-0.57
Allies
-0.56
capsule
-0.56
Stats
-0.55
Britons
-0.55
looms
-0.55
fumble
-0.55
POSITIVE LOGITS
tre
0.91
ï¸ı
0.89
vable
0.81
forth
0.76
tu
0.75
emb
0.75
ulty
0.75
vent
0.74
yet
0.74
minent
0.74
Activations Density 0.135%