INDEX
Explanations
certain characters or character combinations within the text
occurrences of a specific character or symbol
New Auto-Interp
Negative Logits
merce
-0.97
adolesc
-0.81
ufact
-0.77
carbohyd
-0.74
scrut
-0.74
auga
-0.74
ntil
-0.73
agre
-0.72
yip
-0.71
oun
-0.70
POSITIVE LOGITS
×ķ
0.95
ł
0.92
ï¸ı
0.88
×Ļ×
0.87
Ñĥ
0.81
ãĥ¼ãĥ³
0.80
ा
0.78
ople
0.78
ķ
0.76
hens
0.75
Activations Density 0.008%