INDEX
Explanations
phrases related to financial transactions and advice
New Auto-Interp
Negative Logits
folks
-0.25
substantive
-0.23
assistance
-0.23
aspects
-0.22
assessment
-0.22
ongoing
-0.21
longstanding
-0.21
subsequent
-0.21
standout
-0.21
requisite
-0.20
POSITIVE LOGITS
famous
0.33
completely
0.29
stupid
0.23
correct
0.22
forbidden
0.22
corrupted
0.22
explicitly
0.20
immortal
0.20
literally
0.20
absolutely
0.20
Activations Density 0.078%