INDEX
Explanations
phrases that indicate ongoing actions or efforts
New Auto-Interp
Negative Logits
ieu
-0.65
ruby
-0.63
scription
-0.63
belief
-0.62
ependent
-0.62
eteenth
-0.61
Frie
-0.61
Information
-0.61
Perspective
-0.59
Barron
-0.59
POSITIVE LOGITS
elevate
0.88
obtain
0.87
enhance
0.85
abolish
0.85
broaden
0.83
stabilize
0.83
strengthen
0.82
meet
0.82
improve
0.82
maximize
0.81
Activations Density 0.099%