INDEX
Explanations
sentences ending with a full stop
sentence-like structures, particularly those that suggest a statement or assertion
New Auto-Interp
Negative Logits
distinguished
-0.79
activ
-0.74
scattered
-0.74
classified
-0.73
effected
-0.73
eager
-0.72
purposes
-0.72
broadly
-0.70
observers
-0.70
uly
-0.70
POSITIVE LOGITS
Beware
1.35
Conclusion
1.22
[+
1.15
Avoid
1.09
Why
1.09
Deter
1.05
Provide
1.05
Thou
1.05
Increase
1.02
Explain
1.02
Activations Density 0.062%