INDEX
Explanations
references to blockades or barriers
New Auto-Interp
Negative Logits
827
-0.17
ikt
-0.16
zion
-0.16
ullan
-0.16
imli
-0.15
isin
-0.15
-0.15
ffects
-0.15
fov
-0.15
onde
-0.15
POSITIVE LOGITS
buster
0.40
busters
0.38
aded
0.38
ades
0.35
quote
0.35
age
0.33
ages
0.29
ading
0.29
chains
0.29
quotes
0.27
Activations Density 0.020%