INDEX
Explanations
terms related to evaluation or comparison of experiences
New Auto-Interp
Negative Logits
iaux
-0.16
undi
-0.15
каж
-0.14
Ĺi
-0.14
appiness
-0.14
olik
-0.14
taÅŁ
-0.13
½Ķ
-0.13
loff
-0.13
gn
-0.13
POSITIVE LOGITS
ever
1.37
ever
1.05
-ever
1.04
EVER
0.99
Ever
0.95
Ever
0.90
jamais
0.61
EVER
0.60
soever
0.44
Everett
0.42
Activations Density 0.197%