INDEX
Explanations
references to physical spaces and their qualities
New Auto-Interp
Negative Logits
trl
-0.17
Fairfax
-0.15
essen
-0.15
lip
-0.15
rex
-0.15
la
-0.15
reds
-0.14
acades
-0.14
ross
-0.14
ifestyles
-0.14
POSITIVE LOGITS
yonel
0.20
bilt
0.17
0.16
/time
0.16
ful
0.16
holders
0.15
erif
0.15
leigh
0.15
yb
0.15
fill
0.14
Activations Density 0.065%