INDEX
Explanations
numerical quantities or emphasis
phrases indicating personal perceptions or interpretations
New Auto-Interp
Negative Logits
impl
-0.63
igr
-0.63
worthy
-0.61
esian
-0.61
endeavor
-0.59
abin
-0.58
worthiness
-0.57
endeavour
-0.57
ridge
-0.56
demol
-0.55
POSITIVE LOGITS
rid
1.18
tin
0.94
bored
0.92
DragonMagazine
0.86
lucky
0.84
tired
0.81
TING
0.78
cloneembedreportprint
0.78
reimb
0.77
rewarded
0.75
Activations Density 0.120%