INDEX
Explanations
references to tubes or cylindrical objects
New Auto-Interp
Negative Logits
vironment
-0.79
nee
-0.76
inoa
-0.71
nesota
-0.70
Lowry
-0.69
lied
-0.67
Herb
-0.66
acebook
-0.66
utenant
-0.66
riad
-0.65
POSITIVE LOGITS
tubes
0.99
tube
0.97
diameter
0.81
anus
0.80
ength
0.78
Britann
0.77
ules
0.76
pus
0.73
lengths
0.72
protr
0.72
Activations Density 0.005%