INDEX
Explanations
phrases related to specific locations or areas
references to cities and local contexts
New Auto-Interp
Negative Logits
Purs
-0.62
idate
-0.57
chieve
-0.57
awaru
-0.56
confir
-0.56
culminated
-0.55
succeeds
-0.54
Straw
-0.53
ukong
-0.53
puzz
-0.52
POSITIVE LOGITS
anyways
1.11
whereas
1.04
anyway
1.01
nowadays
0.90
comparatively
0.83
unlike
0.82
so
0.81
rather
0.80
:(
0.79
(~
0.78
Activations Density 0.607%