INDEX
Explanations
phrases related to justification or reason
the word "because" in various contexts to indicate reasoning or justification
New Auto-Interp
Negative Logits
mint
-0.75
load
-0.71
tumblr
-0.69
lez
-0.69
Trend
-0.68
conom
-0.68
gren
-0.67
TABLE
-0.67
Samson
-0.63
sell
-0.62
POSITIVE LOGITS
nor
0.91
unsub
0.80
anymore
0.78
neither
0.75
spoilers
0.69
yet
0.69
censorship
0.69
lack
0.68
pesky
0.67
Nor
0.67
Activations Density 0.141%