INDEX
Explanations
important statements or justifications
phrases indicating significant consequences or importance
New Auto-Interp
Negative Logits
Fuck
-0.72
Yep
-0.72
Damn
-0.69
Awesome
-0.69
ulhu
-0.67
Enjoy
-0.67
finished
-0.65
haha
-0.65
Finish
-0.64
hopped
-0.64
POSITIVE LOGITS
Suppose
0.90
embodiments
0.87
economists
0.82
methodological
0.81
proponents
0.81
empirical
0.80
theorists
0.80
policymakers
0.79
typically
0.79
practitioners
0.79
Activations Density 0.916%