INDEX
Explanations
and activate a specific pattern of characters, possibly related to a specific language or encoding
special characters or symbols in the text
New Auto-Interp
Negative Logits
manif
-0.83
nesota
-0.80
espie
-0.77
ocene
-0.74
osc
-0.73
othal
-0.72
clair
-0.71
urus
-0.70
ossier
-0.68
neighb
-0.68
POSITIVE LOGITS
à¤
0.90
ł
0.89
天
0.89
âϦ
0.87
âķIJâķIJ
0.86
DOWN
0.85
ķ
0.84
Ĭ
0.84
å§
0.81
rans
0.80
Activations Density 0.023%