INDEX
Explanations
phrases related to suppression and inhibition
New Auto-Interp
Negative Logits
Dok
-0.71
Peter
-0.68
Dok
-0.68
Beek
-0.67
ek
-0.66
تط
-0.64
ⓧ
-0.63
k
-0.62
Kinder
-0.62
Peter
-0.60
POSITIVE LOGITS
suppress
1.63
Suppression
1.54
SUP
1.49
SUP
1.48
suppression
1.46
Sup
1.44
suppresses
1.43
suppressed
1.42
suppressor
1.40
Supp
1.38
Activations Density 0.200%