INDEX
Explanations
phrases indicating instructions or recommendations
instructions or calls to action
New Auto-Interp
Negative Logits
sadd
-0.69
reneg
-0.68
presided
-0.68
upsetting
-0.66
reunited
-0.66
supposedly
-0.66
certainly
-0.65
indeed
-0.65
reunion
-0.65
quo
-0.64
POSITIVE LOGITS
Use
3.31
Use
2.54
Uses
2.20
USE
2.07
use
1.81
Usage
1.76
use
1.73
Using
1.69
Used
1.54
Using
1.43
Activations Density 0.012%