INDEX
Explanations
words containing the substring "ns" with a high level of activation
the presence of the word "ns" in various contexts
New Auto-Interp
Negative Logits
MAD
-0.68
gri
-0.64
plac
-0.63
blast
-0.59
Madness
-0.59
adjustment
-0.59
freeze
-0.59
absentee
-0.58
fine
-0.57
starship
-0.57
POSITIVE LOGITS
ns
4.57
ns
1.73
ls
1.52
nces
1.52
NS
1.50
nc
1.49
n
1.48
nt
1.43
nn
1.37
nes
1.35
Activations Density 0.010%