INDEX
Explanations
phrases related to involvement or participation
phrases indicating participation or involvement in activities
New Auto-Interp
Negative Logits
overe
-0.69
ek
-0.68
survived
-0.68
aut
-0.66
bene
-0.65
nor
-0.65
Interstitial
-0.64
theless
-0.63
lect
-0.61
ieth
-0.61
POSITIVE LOGITS
OOL
0.80
Rid
0.74
groove
0.69
sidx
0.68
LET
0.67
retty
0.64
skirts
0.63
snipp
0.63
dirty
0.62
jams
0.61
Activations Density 0.191%