INDEX
Explanations
explanatory sentences starting with "Here."
phrases that indicate a reference point to information or content
New Auto-Interp
Negative Logits
ise
-0.62
abyte
-0.61
growth
-0.59
ank
-0.58
medication
-0.58
amas
-0.58
otherwise
-0.58
diplomacy
-0.57
credibility
-0.55
internal
-0.55
POSITIVE LOGITS
Here
3.26
Here
2.17
Below
1.94
HERE
1.76
here
1.64
Below
1.45
here
1.41
Now
1.33
Above
1.25
Again
1.23
Activations Density 0.011%