INDEX
Explanations
phrases related to providing guidance or instructions
phrases describing the easiest methods or approaches to accomplish tasks
New Auto-Interp
Negative Logits
IT
-0.53
AA
-0.51
ISS
-0.51
serious
-0.50
ting
-0.49
CE
-0.49
vik
-0.49
INA
-0.48
embed
-0.47
than
-0.47
POSITIVE LOGITS
easiest
3.04
simplest
2.55
quickest
2.36
cheapest
2.26
safest
2.18
shortest
1.74
smallest
1.73
hardest
1.69
happiest
1.62
liest
1.60
Activations Density 0.013%