INDEX
Explanations
the word "else"
the phrase "anything else."
New Auto-Interp
Negative Logits
gets
-0.68
gers
-0.66
Runner
-0.63
hai
-0.63
haw
-0.63
cers
-0.62
Soviets
-0.61
lord
-0.59
Roose
-0.59
iterator
-0.59
POSITIVE LOGITS
worldly
0.89
describ
0.89
besides
0.85
includ
0.84
imaginable
0.80
nces
0.76
mattered
0.75
happened
0.72
icum
0.69
else
0.68
Activations Density 0.019%