INDEX
Explanations
phrases relating to key points or insights
New Auto-Interp
Negative Logits
etheless
-0.66
brill
-0.64
advertised
-0.62
agre
-0.61
david
-0.60
constitu
-0.59
orchestr
-0.59
submar
-0.58
reluct
-0.57
warr
-0.57
POSITIVE LOGITS
aways
1.70
overs
1.27
away
1.16
offs
1.01
over
1.00
shots
0.96
OVER
0.88
prising
0.82
outs
0.81
out
0.81
Activations Density 0.036%