INDEX
Explanations
phrases or words that emphasize exclusivity or singularity
New Auto-Interp
Negative Logits
simply
-0.16
Simply
-0.15
onders
-0.15
lets
-0.14
alias
-0.14
ikal
-0.14
_simps
-0.13
cken
-0.13
istle
-0.13
whatever
-0.13
POSITIVE LOGITS
thing
0.38
remaining
0.33
way
0.29
remaining
0.28
Remaining
0.25
thing
0.25
Thing
0.25
Thing
0.23
Remaining
0.22
surviving
0.21
Activations Density 0.041%