INDEX
Explanations
names or proper nouns
patterns of repeating letters or syllables in words
New Auto-Interp
Negative Logits
shoulder
-0.72
prost
-0.72
multim
-0.71
bureaucr
-0.71
corridors
-0.70
Directions
-0.69
envelope
-0.67
DIRECT
-0.66
âĢ¢âĢ¢
-0.65
rift
-0.64
POSITIVE LOGITS
rex
0.99
ember
0.99
eware
0.98
atron
0.98
rill
0.98
bach
0.97
ador
0.96
itan
0.95
orah
0.93
artz
0.93
Activations Density 0.163%