INDEX
Explanations
phrases starting with a specific symbol followed by a combination of letters and numbers
a specific character that appears repeatedly in the text
New Auto-Interp
Negative Logits
seiz
-0.72
hors
-0.64
RIS
-0.63
pastry
-0.62
fortun
-0.61
trainers
-0.60
shack
-0.59
ozy
-0.58
Seym
-0.58
Palest
-0.57
POSITIVE LOGITS
¢
0.98
¡
0.97
¤
0.93
£
0.89
ª
0.89
ħ
0.88
Ļ
0.87
Ĭ
0.86
ı
0.83
ķ
0.83
Activations Density 0.409%