INDEX
Explanations
the word "or" in various contexts
New Auto-Interp
Negative Logits
ires
-0.79
estern
-0.78
erest
-0.77
rue
-0.76
olicy
-0.75
irms
-0.74
axies
-0.72
20439
-0.71
estro
-0.69
arten
-0.69
POSITIVE LOGITS
chard
0.98
alternatively
0.97
ifice
0.93
acles
0.89
equival
0.88
acle
0.83
chest
0.82
chid
0.82
acular
0.81
phans
0.79
Activations Density 0.039%