INDEX
Explanations
instances of the word "write" and its variations, indicating a focus on writing actions or commands
New Auto-Interp
Negative Logits
Trop
-0.58
ganggu
-0.53
従
-0.50
Cougars
-0.50
terecht
-0.49
Kang
-0.48
)++;
-0.48
orghini
-0.48
cope
-0.47
nemico
-0.47
POSITIVE LOGITS
write
1.74
write
1.66
Write
1.56
Write
1.55
writing
1.50
Writing
1.36
Writing
1.34
WRITE
1.33
writing
1.33
writes
1.32
Activations Density 0.126%