INDEX
Explanations
instances of the character "ľ" in the text
New Auto-Interp
Negative Logits
disadvant
-0.80
womb
-0.76
condem
-0.75
conduc
-0.74
manif
-0.74
slic
-0.73
altern
-0.73
apes
-0.73
mounts
-0.72
limb
-0.72
POSITIVE LOGITS
ï¸ı
1.19
âĶĢâĶĢ
0.93
à¼
0.92
\/\/
0.86
âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
0.85
Edited
0.83
×Ķ
0.83
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
0.81
ishable
0.79
ihad
0.79
Activations Density 0.088%