INDEX
Explanations
words related to location or spatial relationships
New Auto-Interp
Negative Logits
UFF
-0.16
lessly
-0.15
pton
-0.15
ide
-0.15
elon
-0.14
usable
-0.14
estates
-0.14
illas
-0.14
estate
-0.14
spiel
-0.14
POSITIVE LOGITS
lie
0.19
neath
0.18
ausp
0.16
Invariant
0.16
á»ı
0.15
weg
0.15
supervision
0.15
werp
0.15
ander
0.15
ancode
0.14
Activations Density 0.008%