INDEX
Explanations
questions and negative statements
New Auto-Interp
Negative Logits
antMatchers
-1.12
שוליים
-0.99
itſelf
-0.88
Composable
-0.88
Rhonda
-0.88
Grau
-0.88
InputBorder
-0.86
Houſe
-0.86
Ise
-0.86
houſe
-0.85
POSITIVE LOGITS
did
1.73
Did
1.70
did
1.67
DID
1.64
Did
1.54
DID
1.48
Didi
1.21
didn
1.04
Didn
1.01
done
0.98
Activations Density 0.091%