INDEX
Explanations
phrases expressing gratitude or excitement for a situation, achievement, or role
expressions of existence or being
New Auto-Interp
Negative Logits
rones
-0.72
rigs
-0.69
plings
-0.68
preparations
-0.68
strain
-0.66
dispute
-0.66
strains
-0.65
Prescott
-0.65
reconstruction
-0.63
itures
-0.63
POSITIVE LOGITS
able
1.28
reminded
0.90
bitten
0.89
judged
0.89
leeve
0.88
league
0.88
entertained
0.86
auc
0.85
aware
0.85
reunited
0.85
Activations Density 0.115%