INDEX
Explanations
affirmative statements about ongoing or enduring situations
New Auto-Interp
Negative Logits
offsetof
-0.65
eneuve
-0.63
[]
-0.63
estacks
-0.58
coledì
-0.58
)];
-0.58
[]:
-0.57
letics
-0.57
ilung
-0.57
PhysRevLett
-0.57
POSITIVE LOGITS
still
1.09
Still
1.05
fortfarande
1.05
still
1.03
Still
1.00
todavía
0.84
fortsatt
0.84
STILL
0.84
ainda
0.83
stadig
0.81
Activations Density 0.287%