INDEX
Explanations
phrases or sentences emphasizing the exclusivity or uniqueness of something
repeated phrases emphasizing limitations or conditions related to "only"
New Auto-Interp
Negative Logits
ducers
-0.68
hement
-0.68
ne
-0.64
ongs
-0.64
alam
-0.64
idon
-0.63
psc
-0.63
wealth
-0.63
tails
-0.61
rigan
-0.60
POSITIVE LOGITS
marginally
1.17
kidding
0.83
partially
0.82
accessible
0.82
scratched
0.80
slightly
0.77
moderately
0.76
halfway
0.75
temporary
0.74
scratching
0.74
Activations Density 0.054%