INDEX
Explanations
icons or symbolic representations
references to icons and iconography
New Auto-Interp
Negative Logits
THER
-0.71
ndum
-0.71
nda
-0.69
EEE
-0.68
razil
-0.68
ood
-0.67
actory
-0.66
nder
-0.66
apons
-0.64
IGH
-0.64
POSITIVE LOGITS
ocl
1.35
icon
1.05
ically
1.02
nect
1.00
ico
0.93
icon
0.90
icons
0.90
icons
0.90
ographically
0.84
onym
0.82
Activations Density 0.014%