INDEX
Explanations
phrases indicating a generalized concept or unspecified thing
references to the concept of "anything" or unspecified subjects
New Auto-Interp
Negative Logits
fer
-0.72
irth
-0.70
anonymity
-0.67
ardi
-0.67
arb
-0.66
nec
-0.65
asio
-0.64
ritz
-0.63
Encyclopedia
-0.63
ulators
-0.62
POSITIVE LOGITS
else
1.69
Else
1.35
resembling
1.26
Else
1.10
else
0.97
THING
0.94
remotely
0.94
imaginable
0.93
whatsoever
0.88
happens
0.85
Activations Density 0.046%