INDEX
Explanations
instances of personal research and exploration in various contexts
New Auto-Interp
Negative Logits
isci
-0.17
edback
-0.14
Transcript
-0.14
kke
-0.14
/trunk
-0.13
kad
-0.13
indi
-0.13
važ
-0.13
iddi
-0.13
ocker
-0.12
POSITIVE LOGITS
0.58
0.56
research
0.53
0.50
0.50
goog
0.49
0.48
0.46
researching
0.46
research
0.44
Activations Density 0.410%