INDEX
Explanations
positive affirmations and acceptances
New Auto-Interp
Negative Logits
የወ
0.70
Helm
0.64
nontrivial
0.64
iosos
0.64
𝕦
0.63
অভাব
0.61
𝓌
0.61
conse
0.60
gnię
0.59
섹
0.59
POSITIVE LOGITS
okay
4.51
ok
4.28
OK
4.26
Ok
3.93
OK
3.83
Ok
3.81
Okay
3.81
fine
3.80
okay
3.65
alright
3.64
Activations Density 0.606%