INDEX
Explanations
specific strings of characters or acronyms that are unique and repeated multiple times
references to specific organizations or brands
New Auto-Interp
Negative Logits
tein
-0.65
plates
-0.64
bell
-0.63
birds
-0.63
pora
-0.61
kiss
-0.61
xon
-0.61
iris
-0.61
Constructed
-0.60
strength
-0.59
POSITIVE LOGITS
UD
1.05
PDATE
0.95
nesday
0.93
iamond
0.92
IFIED
0.91
PLIC
0.91
elta
0.87
BLE
0.84
ITY
0.83
geon
0.81
Activations Density 0.012%