INDEX
Explanations
instances of the word "only."
New Auto-Interp
Negative Logits
#End
-0.19
tright
-0.16
etter
-0.16
usz
-0.15
iling
-0.15
UILD
-0.15
ingly
-0.15
rac
-0.15
esy
-0.15
ishly
-0.14
POSITIVE LOGITS
endor
0.16
/or
0.16
eparator
0.15
th
0.14
osh
0.14
ewood
0.14
yaw
0.14
apur
0.14
isd
0.13
horia
0.13
Activations Density 0.012%