INDEX
Explanations
phrases indicating a cause-and-effect relationship or dependency on certain conditions
phrases indicating conditionality or dependency
New Auto-Interp
Negative Logits
vision
-0.82
spir
-0.76
anas
-0.73
oop
-0.71
iverpool
-0.71
jc
-0.70
tha
-0.70
ãĥīãĥ©
-0.70
OWN
-0.69
shr
-0.69
POSITIVE LOGITS
upon
0.79
awaru
0.78
critically
0.78
heavily
0.78
ymm
0.75
solely
0.72
adversely
0.69
principally
0.69
depend
0.68
chiefly
0.68
Activations Density 0.016%