INDEX
Explanations
phrases related to states of mind or mental conditions
phrases that convey states of being or existence
New Auto-Interp
Negative Logits
arently
-0.74
oother
-0.73
eeper
-0.70
direct
-0.69
arent
-0.68
Unsure
-0.68
sqor
-0.66
ombies
-0.64
orous
-0.64
untarily
-0.64
POSITIVE LOGITS
sorts
0.84
juven
0.79
Buddhism
0.77
humankind
0.77
bip
0.75
mankind
0.74
feminism
0.73
masculinity
0.72
theirs
0.71
humanity
0.71
Activations Density 0.446%