INDEX
Explanations
parts related to structured text, such as document sections and titles
New Auto-Interp
Negative Logits
berus
-0.69
anca
-0.67
apons
-0.60
gently
-0.59
asting
-0.57
speakers
-0.57
acha
-0.57
cumbers
-0.57
gaping
-0.57
okia
-0.57
POSITIVE LOGITS
ners
1.25
icularly
1.22
icular
1.20
nered
1.18
ridge
1.18
isans
1.13
ially
1.10
icles
1.07
icipated
1.03
ner
1.03
Activations Density 0.023%