INDEX
Explanations
phrases or sentences indicating exclusivity or limitation
phrases indicating exclusivity or limitation
New Auto-Interp
Negative Logits
idon
-0.77
ducers
-0.67
insula
-0.65
phen
-0.63
PI
-0.60
hement
-0.58
mire
-0.58
Dynamics
-0.57
stice
-0.57
Cot
-0.56
POSITIVE LOGITS
marginally
1.06
lasted
0.99
scratched
0.92
kidding
0.86
cared
0.85
allowed
0.83
cares
0.79
scratches
0.78
lasts
0.77
pretended
0.76
Activations Density 0.057%