INDEX
Explanations
phrases related to social interaction and communication
New Auto-Interp
Negative Logits
cott
-0.64
zn
-0.62
peria
-0.62
aft
-0.59
prus
-0.57
ciples
-0.56
haps
-0.56
zik
-0.55
fork
-0.54
GE
-0.54
POSITIVE LOGITS
interactions
0.81
ivity
0.79
ually
0.79
ively
0.74
interaction
0.73
uate
0.73
iences
0.72
ioned
0.67
uality
0.65
ivating
0.65
Activations Density 7.778%