INDEX
Explanations
words related to eligibility, responsibility, and feasibility
New Auto-Interp
Negative Logits
MB
-0.24
MB
-0.21
iegel
-0.19
γε
-0.18
Browse
-0.17
_MB
-0.17
Brow
-0.16
Bond
-0.16
Brend
-0.15
IED
-0.15
POSITIVE LOGITS
ib
0.89
ib
0.70
иб
0.61
IB
0.61
Ib
0.60
ibi
0.60
.ib
0.55
िब
0.54
Ñĸб
0.54
ibe
0.52
Activations Density 0.069%