INDEX
Explanations
mentions or references to an audience
references to an audience
New Auto-Interp
Negative Logits
Scot
-0.67
loo
-0.67
Plum
-0.64
imov
-0.64
erald
-0.63
uel
-0.63
idy
-0.63
nir
-0.62
pire
-0.61
Ide
-0.61
POSITIVE LOGITS
member
0.92
members
0.89
surrogate
0.82
members
0.80
participation
0.76
audience
0.74
atics
0.73
tuning
0.73
tuned
0.71
laughter
0.70
Activations Density 0.043%