INDEX
Explanations
phrases expressing contrasts and comparisons between different perspectives or approaches
topics related to legal matters and political commentary
New Auto-Interp
Negative Logits
ore
-0.69
vernment
-0.66
abase
-0.63
irie
-0.62
orc
-0.61
cel
-0.61
retro
-0.59
Els
-0.59
ceed
-0.58
overdue
-0.57
POSITIVE LOGITS
whichever
0.72
depending
0.72
Therefore
0.71
causation
0.65
apo
0.64
Either
0.64
Hence
0.63
depending
0.63
Therefore
0.62
energy
0.60
Activations Density 0.552%