INDEX
Explanations
phrases questioning the logic and efficacy of ideas and actions
New Auto-Interp
Negative Logits
apol
-0.17
άνÏī
-0.16
alink
-0.16
')==
-0.15
ÐIJÑĢÑħÑĸв
-0.14
erif
-0.14
canon
-0.14
canonical
-0.14
надлеж
-0.14
chein
-0.14
POSITIVE LOGITS
could
1.09
could
0.96
Could
0.93
Could
0.88
kunne
0.55
могли
0.54
konnte
0.51
могла
0.51
CO
0.51
мог
0.46
Activations Density 0.464%