INDEX
Explanations
mention of influential figures or concepts in professional or academic contexts
New Auto-Interp
Negative Logits
lds
-0.17
ocache
-0.14
Japanese
-0.14
orean
-0.14
Moines
-0.14
odzi
-0.14
Harness
-0.14
ampler
-0.14
JNI
-0.14
Japanese
-0.13
POSITIVE LOGITS
Ace
0.20
Ace
0.17
Captain
0.17
Kurum
0.16
teamwork
0.16
ace
0.16
Gen
0.16
gen
0.15
Shadow
0.15
VS
0.15
Activations Density 0.003%