INDEX
Explanations
instances of the word "write" or related words like "writing" or "written."
occurrences of the word "write" and its variations indicating acts of writing
New Auto-Interp
Negative Logits
Ĭ±
-0.89
agara
-0.77
abe
-0.73
azar
-0.73
EGA
-0.71
Ukrain
-0.71
ega
-0.67
imation
-0.66
Unsure
-0.66
allows
-0.65
POSITIVE LOGITS
poems
0.90
writing
0.87
smanship
0.87
writer
0.85
acters
0.82
poetry
0.77
writers
0.76
poem
0.76
essays
0.76
itatively
0.75
Activations Density 0.047%