INDEX
Explanations
words related to power dynamics and manipulation
instances of numerical values or quantities and their implications in various contexts
New Auto-Interp
Negative Logits
intend
-0.71
angler
-0.67
appropri
-0.67
rapt
-0.63
respons
-0.63
rament
-0.63
pastoral
-0.61
vanishing
-0.60
charm
-0.59
body
-0.59
POSITIVE LOGITS
Explicit
0.88
Lastly
0.76
æĺ¯
0.75
Languages
0.75
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
0.74
à¨
0.74
FORE
0.71
س
0.69
anguages
0.68
Disclaimer
0.68
Activations Density 0.048%