INDEX
Explanations
phrases related to sensitive or confidential information
New Auto-Interp
Negative Logits
ecake
-0.64
ULTS
-0.62
lane
-0.61
orio
-0.61
owered
-0.60
Pis
-0.60
potato
-0.59
Hatch
-0.59
Beard
-0.59
Bass
-0.59
POSITIVE LOGITS
istic
0.99
ity
0.98
izes
0.96
izing
0.93
ized
0.93
isations
0.90
izations
0.88
ism
0.88
ities
0.88
ization
0.87
Activations Density 0.020%