INDEX
Explanations
links or instructions in a text
New Auto-Interp
Negative Logits
instinct
-0.79
blowing
-0.78
exotic
-0.77
thrust
-0.76
overpowered
-0.76
paradox
-0.75
overpower
-0.74
potent
-0.73
snowball
-0.73
impression
-0.72
POSITIVE LOGITS
Please
1.51
Alternatively
1.51
Otherwise
1.48
Additionally
1.47
Also
1.41
Depending
1.33
If
1.33
However
1.30
Each
1.27
Afterwards
1.25
Activations Density 1.268%