INDEX
Explanations
phrases containing the word "no"
New Auto-Interp
Negative Logits
midt
-0.81
worn
-0.66
ulative
-0.66
ivic
-0.66
rored
-0.63
endish
-0.63
iership
-0.63
otton
-0.61
inarily
-0.60
erala
-0.60
POSITIVE LOGITS
xious
1.06
matter
1.02
avail
0.99
terday
0.94
longer
0.93
ct
0.93
oooooooooooooooo
0.85
sooner
0.83
isy
0.83
harm
0.83
Activations Density 0.068%