INDEX
Explanations
titles of books, movies, or historical events
phrases that contain the preposition "of"
New Auto-Interp
Negative Logits
eson
-0.92
ashtra
-0.81
ucci
-0.78
showc
-0.77
aciously
-0.68
diam
-0.68
respective
-0.67
versa
-0.66
formulations
-0.65
efully
-0.65
POSITIVE LOGITS
Excellence
1.08
Inquiry
0.91
Instruction
0.90
Hate
0.90
Light
0.86
Hats
0.86
Transformation
0.85
Execution
0.85
Darkness
0.85
Grind
0.84
Activations Density 0.112%