INDEX
Explanations
affirmative or certain statements
New Auto-Interp
Negative Logits
nick
-0.16
anca
-0.15
agan
-0.14
ạt
-0.14
atego
-0.14
ocking
-0.14
leta
-0.14
sana
-0.14
ãģĤãģĤ
-0.14
anim
-0.13
POSITIVE LOGITS
ech
0.17
otte
0.15
ume
0.15
fewer
0.14
FD
0.14
forth
0.14
014
0.14
296
0.14
ignum
0.14
ificates
0.14
Activations Density 0.017%