INDEX
Explanations
instances of the word "feature" and its variations in various contexts
New Auto-Interp
Negative Logits
sWith
-0.16
arin
-0.16
enheim
-0.16
/fire
-0.15
ni
-0.14
fever
-0.14
oper
-0.14
омеÑĢ
-0.14
ners
-0.14
ses
-0.14
POSITIVE LOGITS
prominently
0.35
tte
0.26
691
0.17
eting
0.17
547
0.16
ettings
0.15
eted
0.15
472
0.15
itarian
0.15
ilities
0.15
Activations Density 0.034%