INDEX
Explanations
the presence of the word "block" in various contexts
New Auto-Interp
Negative Logits
prest
-0.83
appropri
-0.74
tti
-0.67
reluct
-0.66
hospital
-0.66
ppe
-0.66
subdu
-0.65
toget
-0.65
composition
-0.60
seaf
-0.60
POSITIVE LOGITS
able
1.18
ables
1.07
ages
1.05
ers
1.05
ishable
0.98
ances
0.96
ment
0.94
zee
0.94
ades
0.93
aded
0.93
Activations Density 0.003%