INDEX
Explanations
phrases related to human interaction
instances of the word "interact" and related forms
New Auto-Interp
Negative Logits
cision
-0.72
ft
-0.71
prus
-0.67
aft
-0.66
enthal
-0.65
FC
-0.64
fc
-0.64
statement
-0.64
nic
-0.63
secution
-0.62
POSITIVE LOGITS
ivity
1.08
interact
1.06
uate
1.00
interacts
0.99
interactions
0.99
interacted
0.99
interacting
0.93
interaction
0.90
ively
0.86
acebook
0.86
Activations Density 0.011%