INDEX
Explanations
words or phrases related to physical movement or location
New Auto-Interp
Negative Logits
er
-0.20
ticker
-0.15
hopeless
-0.15
eu
-0.15
Shelley
-0.15
ccione
-0.14
à¸ĵà¸ij
-0.14
ivity
-0.14
avec
-0.14
Bam
-0.14
POSITIVE LOGITS
old
0.30
olds
0.27
ills
0.25
OLD
0.21
olding
0.20
old
0.19
older
0.19
.old
0.19
eld
0.19
ill
0.18
Activations Density 0.004%