INDEX
Explanations
statements regarding the significance or relevance of various subjects
New Auto-Interp
Negative Logits
OK
-0.15
ok
-0.15
forgot
-0.15
Heard
-0.14
azen
-0.14
ãĤ¤ãĥ¤
-0.14
aldi
-0.13
orgot
-0.13
emie
-0.13
Cove
-0.13
POSITIVE LOGITS
ham
0.25
becoming
0.23
receiving
0.21
gaining
0.18
correspond
0.18
vari
0.17
pos
0.17
attracting
0.17
convention
0.17
keys
0.16
Activations Density 0.211%