INDEX
Explanations
phrases indicating fitting or compatibility with a particular context
phrases indicating compatibility or fitting within contexts or categories
New Auto-Interp
Negative Logits
heny
-0.69
wl
-0.61
pora
-0.60
Dhabi
-0.60
Accessed
-0.59
alcohol
-0.57
gov
-0.57
xious
-0.57
matter
-0.56
casters
-0.56
POSITIVE LOGITS
precon
0.79
stereotype
0.76
ãĤ¤ãĥĪ
0.71
Position
0.70
bounds
0.70
Interstitial
0.67
criteria
0.65
stereotypes
0.65
oin
0.65
stereotypical
0.64
Activations Density 0.155%