INDEX
Explanations
phrases indicating success or effectiveness
expressions indicating effectiveness or success
New Auto-Interp
Negative Logits
ategory
-0.75
agine
-0.69
htaking
-0.69
rush
-0.68
ilities
-0.67
guyen
-0.67
hyde
-0.66
avorite
-0.66
amera
-0.66
ilion
-0.65
POSITIVE LOGITS
enough
1.30
enough
1.19
Enough
0.91
bye
0.81
behaved
0.80
baum
0.80
esley
0.77
spring
0.77
suited
0.75
vers
0.69
Activations Density 0.039%