INDEX
Explanations
different variations of the term "BL" with decreasing activations
references to a specific television or media series
New Auto-Interp
Negative Logits
enegger
-0.84
mble
-0.81
framework
-0.76
aeda
-0.75
tainment
-0.73
gerald
-0.73
geist
-0.69
duc
-0.68
mens
-0.67
Gund
-0.67
POSITIVE LOGITS
ACK
1.01
OOD
1.00
adder
0.96
ighting
0.96
anca
0.96
anche
0.93
OCK
0.91
umenthal
0.89
ADE
0.86
AST
0.85
Activations Density 0.008%