INDEX
Explanations
occurrences of the phrase "after 9" or related constructions
New Auto-Interp
Negative Logits
zelf
-0.08
жÑĥ
-0.07
swick
-0.07
vd
-0.07
norge
-0.06
nap
-0.06
rowse
-0.06
unda
-0.06
ErrorException
-0.06
Violation
-0.06
POSITIVE LOGITS
words
0.12
thought
0.11
wards
0.11
word
0.10
ward
0.09
/b
0.09
initial
0.08
effects
0.08
no
0.07
initially
0.07
Activations Density 0.048%