INDEX
Explanations
references to drinking alcohol and its consequences
New Auto-Interp
Negative Logits
-1.04
-
-0.93
—
-0.80
itſelf
-0.80
Shakspeare
-0.78
Etats
-0.78
tvguidetime
-0.77
Arki
-0.77
Hift
-0.75
pleaſure
-0.75
POSITIVE LOGITS
...
2.79
…
2.34
..
1.76
..."
1.73
....
1.70
...,
1.62
...)
1.61
...'
1.54
·
1.49
...
1.47
Activations Density 0.186%