INDEX
Explanations
phrases related to instructions, schedules, and events
New Auto-Interp
Negative Logits
voy
-0.75
woman
-0.68
anooga
-0.65
awei
-0.64
azor
-0.63
going
-0.63
breaks
-0.61
res
-0.60
leground
-0.59
rier
-0.57
POSITIVE LOGITS
participate
1.00
accompany
0.98
accommodate
0.94
explore
0.93
ensure
0.92
join
0.91
satisfy
0.90
receive
0.88
partake
0.86
speak
0.86
Activations Density 0.219%