INDEX
Explanations
comma-separated lists of actions or items
punctuation marks and their frequency
New Auto-Interp
Negative Logits
Ī
-0.69
gate
-0.65
Ŀ
-0.61
Yemeni
-0.59
¬¼
-0.57
ilege
-0.56
suits
-0.55
Rice
-0.54
essee
-0.54
Viz
-0.54
POSITIVE LOGITS
pee
0.74
pless
0.71
while
0.67
walk
0.66
apa
0.64
but
0.64
Wars
0.64
depending
0.63
albeit
0.62
depending
0.62
Activations Density 0.465%