INDEX
Explanations
the presence of the word "exist" and its variations in various contexts
existence and reliance
New Auto-Interp
Negative Logits
s
-0.70
ebs
-0.52
_{-\-0.52
WS
-0.52
ws
-0.52
Daniels
-0.49
EDES
-0.47
Autos
-0.47
כס
-0.46
gras
-0.46
POSITIVE LOGITS
contain
0.78
Contain
0.74
exist
0.66
depend
0.65
depend
0.65
Contain
0.65
contain
0.64
Depend
0.63
Exist
0.62
Depend
0.57
Activations Density 0.024%