INDEX
Explanations
Roman numerals
occurrences of specific character patterns or symbols
New Auto-Interp
Negative Logits
manif
-0.87
disadvant
-0.87
misunder
-0.86
levers
-0.83
Vaugh
-0.80
vulner
-0.78
geries
-0.76
promoters
-0.75
incorpor
-0.74
fronts
-0.74
POSITIVE LOGITS
ï¸ı
1.38
âĢº
0.92
cffffcc
0.90
âĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢâĶĢ
0.86
âĸ
0.85
ש
0.83
âĸ¬âĸ¬
0.83
HUD
0.82
×
0.80
STAR
0.80
Activations Density 0.106%