INDEX
Explanations
measurements and details related to distances and directions
New Auto-Interp
Negative Logits
inger
-0.17
iolet
-0.16
γον
-0.16
uesta
-0.15
.CG
-0.15
INGER
-0.15
ormsg
-0.15
swire
-0.14
ienes
-0.14
ovah
-0.14
POSITIVE LOGITS
Pruitt
0.14
Pull
0.14
erot
0.14
llib
0.14
ling
0.14
Armstrong
0.14
Rou
0.14
Arm
0.14
Punk
0.14
bear
0.14
Activations Density 0.002%