INDEX
Explanations
the word "as" in various contexts
New Auto-Interp
Negative Logits
upt
-0.72
oller
-0.71
lish
-0.69
obyl
-0.69
ople
-0.66
ools
-0.65
stall
-0.65
Ò
-0.64
rir
-0.61
oqu
-0.61
POSITIVE LOGITS
regards
1.39
evidenced
1.14
pects
1.03
piring
0.95
opposed
0.94
soon
0.94
pired
0.88
well
0.85
contrasted
0.84
pires
0.81
Activations Density 0.089%