INDEX
Explanations
text related to official correspondence or communication in written form
occurrences of the word "letter" in various contexts
New Auto-Interp
Negative Logits
rans
-0.68
oken
-0.67
orsi
-0.66
Tune
-0.63
tics
-0.62
Abel
-0.61
oleon
-0.61
Occupations
-0.61
Nex
-0.60
illon
-0.60
POSITIVE LOGITS
letter
0.98
Letter
0.94
inbox
0.91
velop
0.90
letter
0.88
letters
0.85
press
0.82
boxing
0.81
penned
0.80
worms
0.79
Activations Density 0.025%