INDEX
Explanations
phrases indicating abundance or availability
New Auto-Interp
Negative Logits
"';
-0.16
cycle
-0.15
asthan
-0.14
mq
-0.14
pee
-0.14
ège
-0.13
IVATE
-0.13
phans
-0.13
gì
-0.13
536
-0.13
POSITIVE LOGITS
yyy
0.18
yyyy
0.17
enough
0.15
sı
0.15
uh
0.15
irth
0.14
enne
0.14
erc
0.14
eva
0.14
ois
0.14
Activations Density 0.021%