INDEX
Explanations
phrases related to communication, such as quotes and questions
punctuation characters and formatting symbols used in the text
New Auto-Interp
Negative Logits
edo
-0.76
Bengal
-0.74
dispers
-0.72
omorphic
-0.68
ured
-0.67
Vaugh
-0.67
destro
-0.66
laus
-0.65
Afric
-0.65
illary
-0.65
POSITIVE LOGITS
_>
0.91
wcsstore
0.90
SOURCE
0.85
[[
0.84
MORE
0.82
<<
0.80
lations
0.80
PER
0.78
HEAD
0.74
QUEST
0.72
Activations Density 0.008%