INDEX
Explanations
references to physical or conceptual spaces
New Auto-Interp
Negative Logits
trl
-0.17
lip
-0.17
ÑģкладÑĥ
-0.16
rex
-0.15
afa
-0.15
ross
-0.15
thal
-0.14
aires
-0.14
l
-0.14
cul
-0.14
POSITIVE LOGITS
yonel
0.21
-temp
0.18
/time
0.18
yb
0.16
holders
0.15
flight
0.15
bru
0.15
ful
0.15
uits
0.15
-time
0.14
Activations Density 0.055%