INDEX
Explanations
comparisons and evaluations between different entities or situations
descriptions of surprising or distressing situations
New Auto-Interp
Negative Logits
ocrates
-0.69
anners
-0.65
oline
-0.63
etc
-0.60
rained
-0.60
decaying
-0.58
ustomed
-0.57
uberty
-0.56
whoever
-0.56
PLIED
-0.56
POSITIVE LOGITS
notable
0.70
noteworthy
0.70
ici
0.69
besides
0.68
icy
0.67
interven
0.63
word
0.62
Featured
0.62
downside
0.62
eyebrow
0.61
Activations Density 0.357%