INDEX
Explanations
instances of the word "mark"
references to numerical ratings or scores
New Auto-Interp
Negative Logits
ILLE
-0.90
abama
-0.74
traged
-0.71
rouch
-0.69
anooga
-0.69
odka
-0.67
Kut
-0.66
urus
-0.66
agan
-0.65
ctory
-0.63
POSITIVE LOGITS
manship
1.46
downs
1.09
down
1.09
eters
1.05
Twain
0.97
ups
0.96
eting
0.89
emark
0.86
lights
0.84
boards
0.81
Activations Density 0.048%