INDEX
Explanations
terms related to tax, money, and policy
words related to injustice or societal issues
New Auto-Interp
Negative Logits
erest
-0.72
put
-0.71
urg
-0.66
street
-0.63
hens
-0.63
lap
-0.61
independence
-0.61
recy
-0.61
shaved
-0.60
hem
-0.59
POSITIVE LOGITS
iser
1.00
iatus
0.97
TAIN
0.85
ãĥķãĤ¡
0.83
otonin
0.80
ãĥī
0.80
ãĤ¼ãĤ¦ãĤ¹
0.78
isers
0.74
Ø©
0.73
lift
0.72
Activations Density 0.006%