INDEX
Explanations
unusual characters within text
specific characters or symbols present in the text
New Auto-Interp
Negative Logits
olphin
-0.75
ioned
-0.72
WARD
-0.70
chilling
-0.68
ifference
-0.68
wards
-0.64
ensitive
-0.64
walk
-0.64
rained
-0.63
ITNESS
-0.63
POSITIVE LOGITS
È
1.26
³
1.17
Ļ
1.09
Ľ
1.07
ł
1.02
ĺ
0.98
½
0.94
£
0.94
Ķ
0.91
ħ
0.91
Activations Density 0.006%