INDEX
Explanations
words related to bill and money
references to specific names, brands, and titles, especially those related to cultural or literary contexts
New Auto-Interp
Negative Logits
vernment
-0.87
ibles
-0.84
ierce
-0.80
eller
-0.79
ressive
-0.79
ially
-0.75
ression
-0.72
cean
-0.71
herty
-0.71
iled
-0.71
POSITIVE LOGITS
laughter
0.84
phrase
0.73
pool
0.73
take
0.73
LOAD
0.72
dial
0.71
hao
0.71
bour
0.71
Mans
0.66
onest
0.65
Activations Density 0.072%