INDEX
Explanations
instances where the text discusses a singular, specific item or topic among a set of choices
New Auto-Interp
Negative Logits
etz
-0.80
storms
-0.74
des
-0.73
gnu
-0.69
redits
-0.69
illus
-0.68
invoke
-0.68
skirts
-0.68
mire
-0.66
ruary
-0.66
POSITIVE LOGITS
thing
1.36
conceivable
1.24
reason
1.24
exception
1.20
way
1.19
remaining
1.17
viable
1.13
downside
1.11
sane
1.10
difference
1.09
Activations Density 0.051%