INDEX
Explanations
phrases or sentences containing punctuation symbols followed by specific characters
instances of strong negative sentiments or failures
New Auto-Interp
Negative Logits
equip
-0.68
inverse
-0.66
lifes
-0.64
equival
-0.63
akedown
-0.62
legally
-0.61
boro
-0.61
alley
-0.60
associate
-0.59
uchs
-0.59
POSITIVE LOGITS
Therefore
0.85
Nevertheless
0.85
However
0.84
Nonetheless
0.83
Firstly
0.83
³³³³
0.83
SEE
0.81
Examples
0.81
Anonymous
0.81
RESULTS
0.80
Activations Density 0.829%