INDEX
Explanations
phrases related to explaining the reasoning or motivation behind something
phrases indicating motivation or reasoning
New Auto-Interp
Negative Logits
ander
-0.85
Pwr
-0.75
issan
-0.73
cki
-0.72
aire
-0.71
idential
-0.69
istic
-0.69
ennes
-0.69
20439
-0.67
alam
-0.66
POSITIVE LOGITS
âĸ¬âĸ¬
0.74
bars
0.70
behind
0.68
closed
0.67
plates
0.67
why
0.66
wards
0.66
WHY
0.65
closed
0.65
byn
0.63
Activations Density 0.017%