INDEX
Explanations
phrases indicating impossibility or prohibition
phrases that express inability or restrictions
New Auto-Interp
Negative Logits
ancer
-0.65
casting
-0.65
Generation
-0.65
Five
-0.63
liter
-0.62
bats
-0.62
ieu
-0.61
offs
-0.60
Loaded
-0.60
-0.59
POSITIVE LOGITS
ĸļ
1.11
reproduce
0.89
necessarily
0.88
afford
0.87
adian
0.86
berra
0.85
't
0.85
exceed
0.83
attest
0.80
Canaver
0.79
Activations Density 0.020%