INDEX
Explanations
the word "for" in various contexts
New Auto-Interp
Negative Logits
orthy
-0.81
rete
-0.79
oS
-0.79
dar
-0.78
zu
-0.74
rets
-0.71
yn
-0.69
irl
-0.68
shi
-0.67
osterone
-0.66
POSITIVE LOGITS
etheless
1.01
nonetheless
0.99
reality
0.86
nevertheless
0.78
sheer
0.77
hindsight
0.73
actual
0.71
Garg
0.70
retrospect
0.67
Builder
0.63
Activations Density 0.153%