INDEX
Explanations
various forms of separators, specifically slashes and related punctuation
New Auto-Interp
Negative Logits
ebek
-0.14
Ga
-0.14
arget
-0.14
rie
-0.13
cheon
-0.13
ga
-0.13
athers
-0.13
otal
-0.13
Bros
-0.13
wy
-0.13
POSITIVE LOGITS
Ïĥι
0.17
/etc
0.16
ien
0.15
etc
0.15
etc
0.15
tainment
0.15
anela
0.14
posted
0.14
alike
0.14
illing
0.14
Activations Density 0.088%