INDEX
Explanations
action verbs indicating performance or behavior
phrases that reference actions or behaviors being discussed
New Auto-Interp
Negative Logits
usk
-0.72
isphere
-0.65
addon
-0.64
selection
-0.61
inen
-0.61
atto
-0.60
iHUD
-0.60
irl
-0.59
osa
-0.58
apsed
-0.58
POSITIVE LOGITS
deserve
1.13
offend
1.07
mattered
1.02
aren
1.00
resonate
0.95
weren
0.93
nobody
0.93
enrich
0.92
rouse
0.91
shouldn
0.90
Activations Density 0.188%