INDEX
Explanations
phrases related to capabilities or objectives
terms related to analytical evaluations and assessments
New Auto-Interp
Negative Logits
ctors
-0.73
ModLoader
-0.73
condem
-0.64
dule
-0.64
SD
-0.61
utters
-0.57
english
-0.57
usha
-0.57
idth
-0.57
prus
-0.57
POSITIVE LOGITS
firsthand
0.98
ourselves
0.91
myself
0.85
yourself
0.79
vividly
0.74
empir
0.73
alone
0.73
yourselves
0.73
lessness
0.70
themselves
0.69
Activations Density 0.274%