INDEX
Explanations
instances of the word "the."
New Auto-Interp
Negative Logits
ambo
-0.93
abul
-0.78
ndra
-0.76
gb
-0.75
tions
-0.74
aza
-0.71
Operation
-0.71
ioned
-0.70
Lago
-0.68
-+-+
-0.68
POSITIVE LOGITS
brunt
1.23
plunge
1.05
reins
1.02
initiative
1.00
blame
0.94
same
0.91
opportunity
0.89
responsibility
0.89
slightest
0.89
cues
0.89
Activations Density 0.029%