INDEX
Explanations
mentions of the word "ang"
mentions of orangutans
New Auto-Interp
Negative Logits
Pwr
-0.78
isot
-0.74
Seym
-0.73
nesday
-0.72
merga
-0.72
earcher
-0.68
everal
-0.65
ngth
-0.65
ouple
-0.65
electroly
-0.65
POSITIVE LOGITS
aroo
1.14
ang
1.03
angs
1.00
etsu
1.00
lia
0.86
bang
0.85
lers
0.83
ethe
0.83
yang
0.83
gang
0.79
Activations Density 0.013%