INDEX
Explanations
builds on existing concepts
New Auto-Interp
Negative Logits
Majority
0.43
reprinted
0.40
majority
0.39
majority
0.39
Background
0.39
Consciousness
0.38
Underlying
0.38
History
0.38
Origin
0.37
*}
0.37
POSITIVE LOGITS
themes
0.65
themes
0.59
темы
0.55
concepts
0.55
concepts
0.53
conceptos
0.53
lessons
0.52
lessons
0.50
years
0.48
successes
0.48
Activations Density 0.030%