INDEX
Explanations
phrases that discuss existence or states of being
New Auto-Interp
Negative Logits
ngth
-1.00
afia
-0.75
fect
-0.72
ividual
-0.70
umbnails
-0.69
udeb
-0.67
elve
-0.66
hail
-0.66
onder
-0.66
andy
-0.65
POSITIVE LOGITS
happening
1.21
untrue
0.94
happen
0.89
occurring
0.89
true
0.89
true
0.86
TRUE
0.84
happened
0.82
achievable
0.80
accomplished
0.79
Activations Density 0.152%