INDEX
Explanations
phrases related to interventions or decision-making
instances of special characters or symbols in the text
New Auto-Interp
Negative Logits
Brist
-0.73
enegger
-0.72
Thomson
-0.70
airs
-0.67
itsch
-0.65
Manhattan
-0.65
Tob
-0.64
Strat
-0.64
Shap
-0.63
Borough
-0.63
POSITIVE LOGITS
¬
1.60
Ļ
1.49
ı
1.23
ĸ
1.21
ª
1.17
¾
1.16
ļ
1.15
¡
1.15
ħ
1.14
ľ
1.13
Activations Density 0.384%