INDEX
Explanations
phrases containing the word "Tw" followed by a number and possibly other characters
the presence of specific tokens or symbols related to a particular format or category
New Auto-Interp
Negative Logits
ãĤ¹ãĥĪ
-0.78
ãĥ£
-0.74
senal
-0.73
++++++++++++++++
-0.71
PRESS
-0.64
³³³³³³³³³³³³³³³³
-0.63
restricting
-0.61
tenance
-0.59
fracturing
-0.58
territorial
-0.58
POSITIVE LOGITS
elfth
1.36
enty
1.25
elve
1.23
olves
1.17
ilight
1.14
inkle
1.13
erker
1.13
orld
1.12
erk
1.12
orks
1.11
Activations Density 0.024%