INDEX
Explanations
numbers indicating points being scored in different contexts
instances of the word "score."
New Auto-Interp
Negative Logits
agan
-0.81
etheless
-0.61
hind
-0.61
lly
-0.60
perial
-0.58
Society
-0.57
IDA
-0.57
conn
-0.57
por
-0.57
pleas
-0.56
POSITIVE LOGITS
score
1.21
scores
1.08
Score
1.01
ificant
0.93
card
0.91
keeper
0.88
Scores
0.87
scored
0.83
cards
0.80
Score
0.80
Activations Density 0.010%