INDEX
Explanations
references to the specific term "Box"
references to specific 'Box' categories or labels
New Auto-Interp
Negative Logits
ittee
-0.83
ufact
-0.80
puter
-0.78
merce
-0.75
FUL
-0.70
asury
-0.69
ptoms
-0.67
opoulos
-0.66
beh
-0.65
CSI
-0.65
POSITIVE LOGITS
es
1.07
Box
1.01
er
0.98
Box
0.97
wra
0.97
esy
0.97
sets
0.96
boxes
0.94
eers
0.94
box
0.93
Activations Density 0.012%