INDEX
Explanations
proper names
proper nouns, especially names
New Auto-Interp
Negative Logits
20439
-0.82
scenes
-0.73
birds
-0.69
Tactics
-0.68
cells
-0.68
mallow
-0.67
bands
-0.66
sled
-0.65
clad
-0.62
Conduct
-0.62
POSITIVE LOGITS
igible
0.85
Edu
0.83
undo
0.81
estine
0.80
aughed
0.80
ardo
0.78
ston
0.77
rative
0.76
ucer
0.74
itialized
0.73
Activations Density 0.029%