INDEX
Explanations
technical language related to the physical world
instances of requests or prompts related to actions or decisions made by individuals or groups
New Auto-Interp
Negative Logits
é¾
-0.81
CLUD
-0.76
OND
-0.72
hest
-0.65
foundland
-0.65
IQ
-0.64
worthy
-0.62
URI
-0.58
KN
-0.58
Condition
-0.57
POSITIVE LOGITS
Instead
1.15
Instead
1.08
instead
1.07
instead
0.99
Rather
0.86
Rather
0.83
preferring
0.78
anymore
0.77
opting
0.75
let
0.75
Activations Density 0.094%