INDEX
Explanations
phrases indicating a restriction or limitation to specific conditions
instances of the word "only" in various contexts
New Auto-Interp
Negative Logits
git
-0.65
idon
-0.63
ysis
-0.62
ffen
-0.58
liner
-0.58
cel
-0.57
insula
-0.57
claimer
-0.56
understatement
-0.56
duino
-0.55
POSITIVE LOGITS
marginally
0.97
insofar
0.96
spor
0.90
onse
0.85
intermitt
0.82
incidentally
0.82
occasionally
0.80
ones
0.75
temporarily
0.75
partially
0.74
Activations Density 0.062%