INDEX
Explanations
references to significant historical figures or events
New Auto-Interp
Negative Logits
inson
-0.17
_Core
-0.16
ston
-0.16
wright
-0.15
dom
-0.14
htub
-0.14
eno
-0.14
engo
-0.14
acher
-0.14
CORE
-0.13
POSITIVE LOGITS
人ãģ¯
0.19
ë¡ľëĬĶ
0.17
çļĦæĺ¯
0.16
shima
0.16
ectl
0.15
ãģ®ãģ¯
0.15
ãģ¡ãģ¯
0.15
apart
0.15
Lange
0.14
ìĤ¬ëĬĶ
0.14
Activations Density 0.107%