INDEX
Explanations
mentions of specific days of the week
occurrences of plural nouns and specific adjective forms related to animals
New Auto-Interp
Negative Logits
enhagen
-0.69
glers
-0.68
azing
-0.68
Cruiser
-0.67
Bei
-0.62
OSH
-0.58
boards
-0.56
Qing
-0.56
arity
-0.55
ASC
-0.55
POSITIVE LOGITS
cript
1.35
cence
1.28
hift
1.21
pring
1.21
ilver
1.18
aurus
1.18
chool
1.18
cue
1.18
cu
1.17
creen
1.15
Activations Density 0.146%