INDEX
Explanations
phrases indicating disagreement or differing perspectives
New Auto-Interp
Negative Logits
Down
-0.15
DOWN
-0.15
Vac
-0.14
vac
-0.14
down
-0.14
ridor
-0.14
Blank
-0.14
McCartney
-0.14
tube
-0.13
Decompiled
-0.13
POSITIVE LOGITS
outside
0.21
outside
0.20
independent
0.20
independ
0.19
recur
0.19
å¤ĸ
0.19
çĭ¬ç«ĭ
0.18
external
0.18
recurrence
0.18
Outside
0.18
Activations Density 0.056%