INDEX
Explanations
statements expressing negation or contradiction
New Auto-Interp
Negative Logits
actionMode
-0.53
ostavi
-0.52
UniformLocation
-0.52
Vaux
-0.46
انجليز
-0.46
ويكيميديا
-0.46
referrerpolicy
-0.45
recevrez
-0.44
้งาน
-0.44
izd
-0.43
POSITIVE LOGITS
forget
1.00
misunderstand
0.88
worry
0.74
misunderstood
0.73
Forget
0.69
forgetting
0.68
Donny
0.67
FetchType
0.66
AnchorStyles
0.65
forget
0.65
Activations Density 0.048%