INDEX
Explanations
parenthetical statements
open parentheses
New Auto-Interp
Negative Logits
inund
-0.81
spir
-0.78
integ
-0.77
overrun
-0.75
undet
-0.75
unus
-0.75
stagn
-0.74
overwhelmed
-0.74
overhaul
-0.73
appropri
-0.71
POSITIVE LOGITS
â̦)
1.47
laughs
1.32
Laughs
1.29
...)
1.23
hide
1.20
See
1.17
emphasis
1.14
Unless
1.13
Ironically
1.10
Though
1.09
Activations Density 0.059%