INDEX
Explanations
phrases emphasizing contrast or preference
phrases that express negation or contrastive ideas
New Auto-Interp
Negative Logits
Norn
-0.68
nec
-0.67
omatic
-0.64
CLASSIFIED
-0.64
clearance
-0.63
CF
-0.63
ced
-0.62
}}}
-0.62
cia
-0.62
Grounds
-0.61
POSITIVE LOGITS
reinvent
0.93
rely
0.87
blindly
0.86
succumb
0.85
relying
0.84
necessarily
0.84
simply
0.83
merely
0.83
speculate
0.81
rahim
0.78
Activations Density 0.156%