INDEX
Explanations
phrases that express conditions or consequences
conditional statements or phrases that introduce hypothetical scenarios
New Auto-Interp
Negative Logits
enos
-0.68
akia
-0.67
Pledge
-0.66
hs
-0.64
azo
-0.63
urga
-0.62
FTWARE
-0.59
Gov
-0.59
Fixes
-0.59
Sins
-0.59
POSITIVE LOGITS
fy
1.21
indeed
0.99
ever
0.97
anything
0.95
not
0.95
tar
0.89
rame
0.88
possible
0.83
ever
0.79
unchecked
0.77
Activations Density 0.064%