INDEX
Explanations
punctuation marks, specifically commas
interrogative phrases and conjunctions
New Auto-Interp
Negative Logits
RM
-0.64
SourceFile
-0.61
front
-0.60
SPA
-0.59
BS
-0.58
oir
-0.58
Nap
-0.58
letters
-0.58
LF
-0.57
Actor
-0.57
POSITIVE LOGITS
into
0.65
limits
0.64
curfew
0.63
ceptor
0.61
ttle
0.60
tacit
0.59
diminishing
0.56
cipl
0.56
azard
0.55
κ
0.54
Activations Density 0.207%