INDEX
Explanations
expressions of acknowledgment and support for initiatives or policies
New Auto-Interp
Negative Logits
vier
-0.14
keÅŁ
-0.13
oref
-0.13
nem
-0.12
rou
-0.12
Scenario
-0.12
masında
-0.12
iens
-0.12
erst
-0.12
ayscale
-0.12
POSITIVE LOGITS
attempts
0.68
attempt
0.67
effort
0.65
efforts
0.64
Attempts
0.59
Attempt
0.59
attempt
0.57
Attempt
0.54
Attempts
0.51
attempting
0.50
Activations Density 0.004%