INDEX
Explanations
various phrases related to ways, perspectives, and comparisons
expressions of judgment or critique regarding arguments or concepts
New Auto-Interp
Negative Logits
anwhile
-0.71
haar
-0.69
quished
-0.67
selected
-0.61
erson
-0.61
lain
-0.60
achus
-0.59
hma
-0.59
yles
-0.59
abella
-0.57
POSITIVE LOGITS
sense
0.96
standpoint
0.94
terms
0.86
nutshell
0.80
senses
0.80
respects
0.73
perspective
0.72
ways
0.71
Terms
0.71
reasons
0.66
Activations Density 0.487%