INDEX
Explanations
phrases related to interactions between individuals or groups
instances of the word "interaction" and its related forms
New Auto-Interp
Negative Logits
zn
-0.86
prus
-0.83
aft
-0.82
cott
-0.81
cise
-0.77
haps
-0.75
enic
-0.74
ussy
-0.74
skip
-0.72
amina
-0.71
POSITIVE LOGITS
interactions
1.22
interaction
1.11
interacts
0.83
iquette
0.82
ivity
0.81
interacted
0.79
uality
0.77
interacting
0.73
ively
0.72
interact
0.71
Activations Density 0.010%