INDEX
Explanations
phrases related to actions or desires
phrases indicating possession or the act of having something
New Auto-Interp
Negative Logits
ilon
-0.67
Bounty
-0.65
coward
-0.62
Kov
-0.62
Line
-0.60
adj
-0.60
hooting
-0.59
Nort
-0.58
gradient
-0.57
\/\/
-0.57
POSITIVE LOGITS
conversation
0.78
impact
0.76
haircut
0.76
rethink
0.73
icum
0.67
oided
0.66
WRITE
0.66
ipolar
0.65
readable
0.65
ctic
0.65
Activations Density 0.240%