INDEX
Explanations
instances where an action is being given or directed towards something
phrases that involve giving something attention or consideration
New Auto-Interp
Negative Logits
ater
-0.73
nels
-0.73
nel
-0.71
amiya
-0.70
yrinth
-0.70
asses
-0.69
alde
-0.68
ablish
-0.66
BILITIES
-0.64
Ship
-0.63
POSITIVE LOGITS
hum
0.73
priority
0.71
goodbye
0.71
legitimacy
0.68
airs
0.66
enthusi
0.65
kell
0.65
immortality
0.63
Goodbye
0.63
andom
0.63
Activations Density 0.166%