INDEX
Explanations
references to spatial relationships and proximity
New Auto-Interp
Negative Logits
<bos>
-0.47
\
-0.42
uitges
-0.40
deney
-0.36
}^{\-0.36
tjen
-0.35
achelor
-0.33
Denial
-0.33
faculty
-0.32
Dea
-0.32
POSITIVE LOGITS
Around
1.45
around
1.45
AROUND
1.44
around
1.39
Around
1.38
AROUND
1.28
autour
1.06
вокруг
1.05
alrededor
0.99
okolo
0.98
Activations Density 0.091%