INDEX
Explanations
assertions about beliefs and opinions
New Auto-Interp
Negative Logits
isn
-1.10
aren
-1.10
weren
-1.04
shouldn
-1.04
hasn
-0.99
doesn
-0.99
wasn
-0.98
Couldn
-0.96
Doesn
-0.96
doesn
-0.96
POSITIVE LOGITS
cannot
0.75
cannot
0.71
Cannot
0.68
Cannot
0.65
Tetapi
0.50
principalTable
0.47
tidak
0.46
did
0.46
tetapi
0.43
但是
0.42
Activations Density 0.369%