INDEX
Explanations
phrases indicating companionship or relationships
New Auto-Interp
Negative Logits
éļĨ
-0.16
.Typed
-0.15
zeug
-0.15
ks
-0.14
corres
-0.14
750
-0.14
çĹ
-0.14
šak
-0.13
corresponding
-0.13
064
-0.13
POSITIVE LOGITS
ieri
0.18
ignon
0.17
PCP
0.15
ecome
0.14
ulty
0.14
.nd
0.14
mav
0.14
iani
0.14
öff
0.14
ATAL
0.14
Activations Density 0.080%