INDEX
Explanations
mentions of specific amounts of money
phrases related to monetary amounts
New Auto-Interp
Negative Logits
resemb
-0.66
worldly
-0.64
chin
-0.63
behavi
-0.61
dismant
-0.58
strugg
-0.58
resemblance
-0.57
curv
-0.57
adherence
-0.56
Anarchy
-0.55
POSITIVE LOGITS
000
2.23
001
1.53
0000
1.43
00
1.42
000
1.30
0002
1.27
002
1.13
00000
1.12
008
1.11
0001
1.11
Activations Density 0.050%