INDEX
Explanations
mentions of cans
references to containers, particularly those used for food and beverages
New Auto-Interp
Negative Logits
RESULTS
-0.73
dishon
-0.66
privileged
-0.63
Advis
-0.62
~~~~~~~~
-0.62
repayment
-0.60
striving
-0.60
Fighter
-0.59
Strikes
-0.59
wedd
-0.59
POSITIVE LOGITS
vas
1.18
't
1.15
isters
1.07
berra
1.05
ister
1.04
adian
0.99
opy
0.95
atell
0.94
avan
0.91
ibal
0.90
Activations Density 0.051%