INDEX
Explanations
phrases indicating comparison or evaluation, focusing on the outcome or result
phrases that describe conditions or situations and their characteristic qualities
New Auto-Interp
Negative Logits
————
-0.63
MQ
-0.62
Doing
-0.59
wheel
-0.59
SPA
-0.58
ugu
-0.58
ulk
-0.57
onday
-0.56
Whe
-0.56
ipping
-0.56
POSITIVE LOGITS
resembles
1.14
exceeds
1.08
justifies
1.01
contradicts
0.99
mirrors
0.97
inspires
0.95
surpass
0.95
undermines
0.94
horr
0.94
prevents
0.93
Activations Density 0.132%