INDEX
Explanations
sentences that contain statistical data or numerical comparisons
New Auto-Interp
Negative Logits
peace
-0.73
bounty
-0.72
wellness
-0.72
compassionate
-0.71
hunger
-0.71
slick
-0.69
greenhouse
-0.69
altru
-0.68
safe
-0.68
garbage
-0.67
POSITIVE LOGITS
Both
1.48
Finally
1.44
Either
1.39
Again
1.36
Lastly
1.35
Together
1.34
Neither
1.30
Regardless
1.29
Thus
1.26
Interestingly
1.26
Activations Density 0.423%