INDEX
Explanations
terms indicating evaluation or judgement
phrases that indicate something is regarded or viewed in a certain way
New Auto-Interp
Negative Logits
hire
-0.86
dayName
-0.82
inas
-0.76
ionage
-0.71
LET
-0.65
bombard
-0.65
aptic
-0.65
mouth
-0.65
erers
-0.64
vous
-0.64
POSITIVE LOGITS
phas
0.90
synonymous
0.90
taboo
0.84
worthy
0.83
kosher
0.83
canon
0.82
infall
0.77
unworthy
0.75
harmless
0.74
sacred
0.74
Activations Density 0.039%