INDEX
Explanations
common articles and determiners in text
New Auto-Interp
Negative Logits
ÙħÙĪØ¯
-0.15
olland
-0.15
μη
-0.15
isman
-0.15
adder
-0.14
addin
-0.14
Hin
-0.13
UDIO
-0.13
tal
-0.13
tered
-0.13
POSITIVE LOGITS
particular
0.42
given
0.41
given
0.34
PARTICULAR
0.31
Given
0.27
_given
0.27
Given
0.26
GIVEN
0.26
particul
0.25
icular
0.22
Activations Density 0.215%