INDEX
Explanations
proper nouns, particularly names of people and places
references to specific individuals, locations, and concepts related to privacy issues
New Auto-Interp
Negative Logits
=-=-=-=-=-=-=-=-
-0.88
=-=-=-=-
-0.64
belief
-0.63
pez
-0.60
flyers
-0.59
observable
-0.58
informative
-0.57
careful
-0.57
harmless
-0.57
STATS
-0.57
POSITIVE LOGITS
batch
1.34
Cumber
1.24
itial
0.98
bies
0.92
lin
0.91
land
0.89
ilant
0.87
shire
0.86
stall
0.85
berry
0.83
Activations Density 0.006%