INDEX
Explanations
weights in kilograms
quantitative measures of weight
New Auto-Interp
Negative Logits
gotten
-0.74
bidden
-0.70
protected
-0.66
western
-0.66
vice
-0.66
Ire
-0.66
href
-0.65
structed
-0.65
bes
-0.62
Dee
-0.62
POSITIVE LOGITS
kg
1.10
kg
1.01
atsu
0.83
enger
0.82
kilograms
0.80
cup
0.78
ammonia
0.77
JV
0.77
ingly
0.76
vl
0.75
Activations Density 0.005%