INDEX
Explanations
expressions of well-being or positive feelings
New Auto-Interp
Negative Logits
Efq
-0.74
RunWith
-0.73
المعيارى
-0.72
fhort
-0.66
massively
-0.63
fhew
-0.62
köz
-0.61
Зноскі
-0.61
wapV
-0.61
myſelf
-0.60
POSITIVE LOGITS
Good
1.47
Good
1.41
good
1.40
good
1.36
GOOD
1.36
GOOD
1.35
bien
1.02
buen
0.99
buena
0.90
goodness
0.90
Activations Density 0.241%