INDEX
Explanations
words related to decision-making or actions
phrases related to conditions or actions that are contingent on prior events
New Auto-Interp
Negative Logits
emale
-0.70
alde
-0.66
cised
-0.62
acial
-0.60
apolis
-0.57
akedown
-0.55
allel
-0.55
FOX
-0.54
idal
-0.54
Cham
-0.54
POSITIVE LOGITS
something
1.78
anything
1.65
things
1.63
something
1.60
things
1.57
nothing
1.56
THING
1.56
Something
1.55
Things
1.54
stuff
1.51
Activations Density 0.363%