INDEX
Explanations
instances of reported beliefs or thoughts expressed in various forms
New Auto-Interp
Negative Logits
awtextra
-0.64
ElementRef
-0.61
InjectAttribute
-0.61
chi̍t
-0.61
Chwiliwch
-0.59
réessayer
-0.58
-0.56
ίδα
-0.53
Ause
-0.53
yssey
-0.53
POSITIVE LOGITS
Wird
0.81
Wird
0.79
enkelte
0.70
yapılan
0.70
mennes
0.69
tempio
0.67
inimes
0.66
brukes
0.64
eaten
0.63
verrà
0.62
Activations Density 0.459%