INDEX
Explanations
expressions of agreement or approval
New Auto-Interp
Negative Logits
vermögen
-0.53
Leip
-0.47
まさに
-0.47
Loyalty
-0.47
Ors
-0.47
ORN
-0.47
еремо
-0.46
réputation
-0.46
keres
-0.46
ेर
-0.46
POSITIVE LOGITS
okay
1.78
alright
1.75
OK
1.68
OKAY
1.60
ok
1.59
Alright
1.56
alright
1.56
okay
1.52
Okay
1.46
Okay
1.46
Activations Density 0.224%