INDEX
Explanations
binary sequences as well as specific words in uppercase
specific capitalized words and acronyms, likely related to locations or organizations
New Auto-Interp
Negative Logits
crunch
-0.70
NPR
-0.70
代
-0.65
IDENT
-0.64
VIDEO
-0.61
AMERICA
-0.60
FIX
-0.60
incub
-0.59
RESULTS
-0.58
grading
-0.57
POSITIVE LOGITS
tein
0.99
fen
0.80
glers
0.76
phia
0.75
ilts
0.75
levard
0.74
lain
0.71
dit
0.71
otte
0.70
tu
0.69
Activations Density 0.185%