INDEX
Explanations
references to academic disciplines and specific fields of study
occurrences of the word "letter" and its variations
New Auto-Interp
Negative Logits
ichita
-0.85
resses
-0.75
anamo
-0.70
akening
-0.67
ourke
-0.67
ij士
-0.66
illed
-0.66
innacle
-0.65
Aust
-0.65
stood
-0.64
POSITIVE LOGITS
Letter
1.29
Letter
1.10
letters
1.03
letter
0.99
Letters
0.93
worms
0.80
gments
0.79
glers
0.79
Emails
0.78
marine
0.77
Activations Density 0.006%