INDEX
Explanations
numbers at the end of words or phrases
the character 'ľ' in the text
New Auto-Interp
Negative Logits
lapt
-0.77
disadvant
-0.76
scissors
-0.75
levers
-0.75
condem
-0.73
matic
-0.72
machines
-0.72
hemor
-0.71
raints
-0.71
sails
-0.70
POSITIVE LOGITS
âĶĢâĶĢ
1.32
ï¸ı
1.09
âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
1.07
âĶĢâĶĢâĶĢâĶĢ
0.99
conom
0.89
°
0.84
×
0.82
0.81
×Ķ
0.79
âĸł
0.79
Activations Density 0.138%